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Health plan 


Proposals to improve the international emergency response to disease outbreaks in the wake of the 


Ebola epidemic should be implemented — but local solutions are the best defence. 


epidemic, how can scientists, health experts and politicians 

ensure that the world is better prepared next time? The global 
failure to respond quickly and effectively to the epidemic has prompted 
much soul searching, and already ideas about what needs to change 
are emerging. The danger is that the political will to realize such solu- 
tions — including better-tailored research agendas and streamlined 
clinical-trials regulation — will be harder to find as Ebola fades from 
the headlines and public concern moves on. 

It is good news, then, that plans to improve global disease response 
are in the pipeline, and could be announced as soon as this weekend. 
Prepared under the leadership of Germany’s Chancellor Angela Merkel, 
the proposals are expected to be presented at the annual summit of 
the G7 group of leading industrialized countries, due to take place in 
Elmau, Germany, on 7-8 June. The package of measures is a promis- 
ing sign that politicians at the highest level take the threat of infectious 
disease seriously, and it lays out some sensible suggestions. 

Surveillance is key. Ebola first emerged in West Africa in December 
2013, but was not identified until the following March, a delay that 
allowed it to spread out of control. The new proposals aim to boost 
surveillance systems by providing low- and middle-income countries 
with US$12 million to $15 million annually. That might not sound 
much, but experience in poorer countries shows that a little money can 
go along way in building capacity, such as a trained workforce. In many 
countries, new investments can also piggyback on existing networks. 

Another lesson of Ebola is the dire need for modern tools such as 
diagnostics, drugs and vaccines. Yet the world’s biomedical research 
and development system is simply not geared towards generating prod- 
ucts for which there is little or no market. 


A s West Africa starts to recover from the worst of the Ebola 


BE PREPARED 

The G7 plan is expected to address this, initially with a survey of 
potential threats and an audit of available candidate drugs and vaccines. 
A funding pool of $50 million to $100 million annually would then 
take the most promising candidates through to phase I clinical trials 
to test their safety. This would mean that, in the event of an outbreak, 
they could be taken directly into clinical trials to test their efficacy in 
the field. This is a sensible proposal. One of the most frustrating aspects 
of the Ebola epidemic was that several potential drugs and vaccines 
existed but had not undergone phase I trials. 

Public—private partnerships created over the past 15 years have 
encouraged the research and development of products to address 
neglected diseases and others that do not have large markets. They 
can also serve as a hub for coordinating research and attracting further 
funds. In principle, therefore, if similar partnerships were aimed at 
potential threats, they could accelerate product development. Research 
agencies must not only step up support for work on such threats, but 
also translate this into medical countermeasures (see page 18). 


Another welcome proposal is to speed up trials by developing 
protocols and experimental designs before outbreaks occur, and to have 
these pre-approved by regulators, so that when there is an outbreak, 
trials could start immediately. For Ebola, such trials were agreed by 
researchers, regulatory authorities and affected countries only after the 
epidemic was under way. (To their credit, everyone involved pulled out 
all the stops to cooperate and fast-track the process. They agreed on pro- 
tocols and experimental designs in a matter of months, a procedure that 

normally takes years.) 


“The ultimate The proposals also call for $150 million to 
goal must be $200 million per year to create a reserve force of 
functional 10,000 scientists and health-care workers who 
health systems can be rapidly deployed during an outbreak 
inevery country.” —a sort of United Nations ‘Blue Helmets’ 


for health. The oft-touted idea has merit, but 
should not distract from the more fundamental need to expand the 
global workforce of disease researchers and health-care workers. 

Who would be in charge of such a force? Indeed, who would 
coordinate the other measures if they are approved? And how will they 
sit with existing initiatives, such as the US-led Global Health Security 
Agenda coalition of several dozen countries and organizations that was 
launched in February last year? 

The new proposals call for the creation of a $40-million-a-year 
multilateral organization that would be responsible for global outbreak 
response. Housed within the World Health Organization (WHO), the 
unit would be autonomous enough to avoid WHO bureaucracy. It 
would have a mandate to link to UN agencies, the World Bank and other 
organizations — including non-governmental organizations, industry 
and philanthropies. Such inclusiveness has too often been insufficient. 

The G7 and other countries should back the suggested moves. 
Governments and organizations should find the cash required. But all 
involved should remember that international systems for responding 
to outbreaks are only part of the picture — as starkly exposed by Ebola. 
What is most important is having robust health systems on the front line. 

At the World Health Assembly, the annual meeting of the health 
ministers of WHO member states, which was held last month in 
Geneva, Switzerland, Merkel rightly said that the ultimate goal must 
be functional health systems in every country. That was the explicit 
aim of the revised International Health Regulations, adopted in 2007 
by 196 countries, including WHO member states, which committed to 
targets for surveillance and emergency-response infrastructure. 

Yet most countries missed the 2012 deadline for achieving these 
targets, with only 64 countries reporting being up to speed as of this 
year. At the Geneva meeting, the deadline for the remainder was 
extended yet again, to 2016. In the current flurry of interest in emer- 
gency initiatives, society should not forget that what is needed most is 
long-term investment in research and health-care systems in all coun- 
tries, so that they can better respond to disease threats themselves. = 
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Misplaced faith 


The public trusts scientists much more than 
scientists think. But should it? 


by the Royal Society of Chemistry (RSC) has revealed that its 

members are pessimistic about their status in society. The general 
public, said the chemists, thinks that chemistry is boring and of little 
value. Worse, they said, the public thinks chemists are unapproachable. 
Such negative views have shaped the way that British chemists have pro- 
moted themselves and their research; they have focused on counteract- 
ing a negative and damaging stereotype about chemistry and chemicals. 

Yet as RSC science communicator Chiara Ceci writes in a World 
View on page 7, the British public does not think these things at all. In 
another part of the poll, most members of the public were generally 
positive about chemistry, if a little hazy on its specific benefits and 
exactly what chemists do. The strongest reaction to the central science 
was not fear or confusion, but simple indifference. That can be use- 
ful. It creates what public-relations experts call a void in the collective 
consciousness — one that they can fill with positive images. 

Ifthe British public likes chemistry — at least more than the chem- 
ists believed — then it is positively glowing about science in general. 
Survey respondents described it with words such as ‘welcoming; 
‘sociable’ and ‘fun. And a separate poll by Ipsos MORI this year showed 
that scientists are among the most trusted professionals in Britain; 
some nine in ten people said that they trust scientists to follow all of 
the research rules and regulations relevant to them. 

How many scientists would say the same? Not many, probably, of the 


B= chemists are a diffident and self-conscious bunch. A poll 


attendees at this week’s 4th World Conference on Research Integrity in 
Rio de Janeiro, Brazil. As we report on page 14, attendees at the week- 
end discussed the latest high-profile case of scandal, fraud allegations 
and retraction. The attention drawn by the paper in question — dis- 
cussing how views on same-sex marriage can be changed — prompted 
The New York Times to publish an editorial titled ‘Scientists who cheat. 
That will not help to fill any void with positive images. 

Some scientists do cheat, of course, just as some scientists drive too 
fast, take drugs and are unfaithful to their spouses. The reasons are 
complex and varied. With some exceptions, scientific organizations 

do not engage with the issue of misconduct 


“Nine in ten as seriously as they should. Why would they, 
people trust when public confidence and (moral and 
scientists to financial) support remains so high? 
follow the rules. Media coverage of the same-sex-marriage 
How many retraction was laced with portentous language, 
scientists would claiming that faith and trust in science had 
say the same?” been profoundly shaken. Yet, as researchers 


who follow misconduct issues will know, faith 
and trust in science have survived worse in recent years. 

That should not be taken as an excuse to ignore the problem of 
research misconduct or to minimize its importance. And although high- 
profile fraud makes headlines, a broader and more common set of unap- 
pealing behaviours — from corner-cutting to data-juggling — lie under 
the surface. Convention says that a tiny minority of scientists cheats, yet 
academics and researchers frequently make the case that irregularities 
are widespread. A 2014 survey of hundreds of economists, for example, 
found that 94% admitted to having engaged in at least one “unaccepted” 
research practice (S. Necker Res. Policy 43, 1747-1759; 2014). 

Just like with British chemistry, it seems that the wider public's view 
of science and research is rosier than that of many people who are 
directly involved. For how long can this continue? m 


To Pluto 


The coming months promise to shed new light 
on the Solar System’s underworld. 


mythology, while Zeus got sovereignty of heaven and Poseidon 

mastery of the seas, their brother Pluto (former name, Hades) 
was lumbered with the underworld and its legions of the dead. Pluto 
the planet had its discovery delayed by a decade-long legal battle, and 
then barely made it into the textbooks of the twenty-first century 
before astronomers decided to strip away its full planetary status. Its 
classification of dwarf planet is still contested by some. To others, it is 
the first example of the plutoid category of trans-Neptunian objects. 

And then there are its moons. When the fourth of Pluto’s satellites 
was discovered in 2011, a campaign headed by Star Trek actor William 
Shatner proposed the name Vulcan. Preferring to maintain the under- 
world theme, astronomers chose Cerberus, after the dominion’s three- 
headed guard dog — although because that is already the name of an 
asteroid, they had to settle on the Greek spelling, Kerberos. 

The other minor moons, Styx (gloomy river and one-time plunge 
pool for the infant Achilles), Hydra (many-headed serpentine senti- 
nel) and Nix (a variant spelling of Nyx, goddess of the night) are joined 
by alarge moon about half the size of Pluto called Charon (ferryman of 
the Styx and son of Nyx). Charon, some astronomers say, forms with 
Pluto the Solar System’s only binary planet. Or perhaps that should be 
double plutoid system. 

In other words, nothing about this corner of our Solar System has 
been straightforward. And as the NASA spacecraft New Horizons 


P=: has always had something of a raw deal. In classical 
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hurtles towards it for the first close-up look at these bodies, astronomers 
this week pose new questions about the heavenly body formerly known 
as the planet Pluto. The answers, some of which might come when New 
Horizons flies past the dwarf planet in mid-July, could help research- 
ers to understand how planets and their moons form in the first place. 

Little is known about Pluto’ creation, but astronomers had assumed 
that it formed from the remains ofa collision between proto-Pluto and 
a proto-Charon. The smaller moons may have then come together 
from bits among the swirling impact debris. The 2012 discovery of 
Styx was already something of a surprise, because studies had sug- 
gested the other three smaller moons were packed so closely together 
that there was no room for another. 

On page 45 of this issue, planetary scientists Mark Showalter 
and Douglas Hamilton describe how they analysed Hubble Space 
Telescope images to build up a picture of the orbital configurations 
and brightnesses of Pluto’s small moons. They find that Styx, Nix 
and Hydra are locked together in what astronomers call three-body 
resonance, a phenomenon that links the timing of their orbits and 
usually makes their movements stable. 

They also suggest that Kerberos is a little out of place. Although Nix 
and Hydra have bright surfaces similar to that of Charon, Kerberos 
appears as darkas coal, and this raises questions about how this mixed 
satellite system might have formed. (Pluto is the brightest of the lot, 
with a reflectivity roughly that of sea ice.) Their findings are discussed 
ina News & Views article on page 40. 

The mythological name of Hades for the god of the underworld 
was replaced because Pluto has a more positive spin: it associ- 
ated the ruler with the mineral wealth found 
underground. The next few weeks promise a 
revival of interest in Pluto, and a polishing of 
its image too. It deserves its time in the (dim 
and distant) Sun. » 
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RSC 


WORLD VIEW  jennisicossen 


uch attention is paid to public attitudes to science. But 
M how much do we think about scientists’ attitudes towards 

the public? For members of a profession that thrives on 
evidence, scientists — and those who communicate, advocate and 
lobby for science — too frequently rely on incorrect assumptions. 

Scientists often believe that the public thinks poorly of them, and 
perhaps chemists more than most. We assume that people think in 
stereotypes: men in white coats, explosions and harmful chemicals. 
We see scare-mongering headlines and misleading advertising about 
‘artificial’ versus ‘natural’ products. We assume that these messages 
carry influence, and this shapes everything from the way we hold 
conversations at parties to more formal efforts in public outreach 
and education. We are defensive, because we assume that chemistry 
is under attack. 

In fact, public attitudes to chemists and chem- 
istry are much more positive than my colleagues 
and I would have dared to hope. Our views of 
public opinion are too negative. I know this 
because the Royal Society of Chemistry (RSC) 
has asked members of the public what they think. 

The results should cheer up chemists every- 
where, and perhaps encourage all scientists to 
take a more nuanced view of what the public does 
and does not understand about science. 

As part of the study, members and staff of the 
RSC were asked how they thought the public 
would respond. The chemists said that public 
activities should counter the negative stereo- 
types and myths that surround chemicals. Just 
over half expected most of the public to say that 
all chemicals are dangerous and harmful. Some 80% thought that the 
public would consider chemists unapproachable. Anda little less than 
one-third of the chemists believed that the public would say the ben- 
efits of chemistry outweighed the harmful effects. They were wrong. 

The results of the study show that the public does not fear or mis- 
understand chemistry. It does not rave about it either. The majority feel- 
ing towards chemistry expressed in the survey was ‘neutral: (Although 
slightly more people reported positive feelings than negative.) 

Overall, three-quarters of people said that chemistry hada positive 
impact on well-being. A majority agreed that chemistry was part of 
the solution, not the problem, on issues including sustainable energy, 
access to food and drinking water, and pollution. 

Contrary to our expectations, there were few spontaneous negative 
associations. Only 1% of the public said that chemistry was boring, 
difficult or confusing. And only 1% mentioned 


explosions or blowing things up. Three times DNATURE.COM 
as many associated chemistry with “attraction _ Discuss this article 
between people”. online at: 


Research on public attitudes to science and _go.tiature.com/c7pyvs 


IF THEY HAVE 
FEW DIRECT 
ASSOCIATIONS WITH 


CHEMISTRY, 


PEOPLE DEFAULT TO 
MEMORIES OF 


SCHOOL. 


Take concepts of chemistry 
out of the classroom 


The public image of chemistry is not as negative as some assume — but many 
people find it hard to connect the field to the real world, says Chiara Ceci. 


scientists are relatively common, but work on specific fields, including 
chemistry, is less so. If we are serious about science communication, 
we should seek insight into our audience and new ways to measure 
our impact. 

The results of the RSC’s study — published this week and available 
in full at http://rsc.li/pac — show that the biggest public challenge fac- 
ing chemistry is not the need to overturn negative images, but to con- 
vince people of the field’s relevance. If they have few direct associations 
with chemistry, people default to memories of school experiences. 
They see chemistry as an abstract pursuit, rather than a real science. 

When asked to describe science more broadly, people used terms 
such as ‘busy’ and ‘discovery, whereas chemistry was burdened with 
‘methodical’ and ‘concentration. People struggle to imagine how 
chemistry affects their everyday lives and regard chemists as lacking in 
agency: they do not recognize how chemists are 
involved in the end product of their own work. 
Chemistry is a “science for scientists’, rather than 
for the public. 

Chemistry has long provided insight, building 
blocks and essential tools that are exploited by 
researchers in other disciplines. It underpins so 
many aspects of science that it gets lost. To bridge 
the distance between chemistry and society, we 
need to make the field more tangible for people. 

How can this be done? A gap between two of 
the most significant findings offers an oppor- 
tunity. Although the overwhelming majority 
of the people polled said that chemistry offered 
benefits, they did not have much knowledge or 
experience of how it actually does this. This is a 
void that can be filled with positive examples and role models. We are 
pushing against an open door. 

One idea that was popular in the survey was to take chemistry away 
from the classroom in people’s minds and to place it in the kitchen. 
Food and cooking show people that chemistry is not the sole territory 
of experts. Members of the public liked the idea that we are all chem- 
ists in a way: it builds up their confidence and they start thinking of 
chemistry as part of life rather than a subject that they will be tested on. 

The research threw up one major obstacle for chemistry that may 
be unique to the United Kingdom. When you tell British people that 
you are a chemist, it seems that most assume you work in a pharmacy. 
On these shores, it could be a useful first step for us to say that we are 
‘scientists who work in chemistry. More broadly, before chemists or 
any other groups try to influence public attitudes towards science, it is 
important that we examine what we think of the public. = 


Chiara Ceci is a science communicator at the Royal Society of 
Chemistry in Cambridge, UK. 
e-mail: cecic@rsc.org 
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RESEARCH HIGHLIGHTS 


Gene therapy halts 
type 1 diabetes 


Transferring part of an insulin 
gene into liver cells triggers a 
specific immune response that 
protects mice from one form of 
diabetes. 
Type 1 diabetes occurs 
when T cells target and kill 
insulin-producing islet 
cells in the pancreas. Maria 
Grazia Roncarolo of Stanford 
University in California and 
her team transferred a gene 
fragment encoding some of the 
insulin B chain into the livers 
of mice engineered to develop 
this disease, and monitored the 
effects. Islet cells lived for up 
to 33 weeks after treatment of 
animals in a prediabetic state. 
In untreated mice, around 
80% of insulin-producing cells 
were destroyed. The transfer, in 
combination with an antibody, 
reversed symptoms in mice 
that had developed diabetes. 
The gene fragment 
stimulated regulatory T cells 
that are specific for insulin, 
suppressing the insulin- 
attacking T cells. 
Sci. Transl. Med. 7,289ra81 (2015) 


Memory metal 
sets flex record 


Analloy that can bend and 
return to its original shape at 
least 10 million times could 
prove useful in applications 
including medical devices and 
refrigeration. 

Bending a ‘shape-memory 
alloy’ changes its crystal 
structure from one phase to 
another, whereas applying heat 
reverses that transition. But 
structural damage causes these 
materials to lose their shape 
memory within a few thousand 
cycles. A team led by Eckhard 
Quandt at the University of 
Kiel, Germany, and Manfred 


Selections from the 
scientific literature 


VOLCANOLOGY 


New islands reveal Red Sea rifting 


Two volcanic islands that have emerged in the 
southern Red Sea suggest that the area is more 
geologically active than was thought. 

Sholan Island surfaced in December 2011 and 
Jadid Island appeared in October 2013, forming 
part of the Zubair archipelago (pictured). Seismic 
data and satellite radar measurements show that 
both islands were created by magma squirting up 
along north-south fractures under the sea floor, 


Wuttig at the University of 
Maryland, College Park, has 
now created a titanium-nickel- 
copper alloy (Ti;,Ni,,Cu,,) 
that averts this memory loss. 
Layers of Ti,Cu in the material 
act as templates that guide the 
complete transition between 
the two crystal phases. 

This template approach 
could offer a way of creating 
better shape-memory alloys. 
Science 348, 1004-1007 (2015) 


Tropics feel effect 
of iceberg thaw 


Prehistoric icebergs in the 
North Atlantic had a greater 
influence on tropical climate 
than was previously thought. 
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Rachael Rhodes of Oregon 
State University in Corvallis 
and her colleagues constructed 
a 60,000-year methane record 
from a west Antarctic ice 
core. They found elevated 
methane levels during cold 
periods, which seemed to 
coincide with “Heinrich 
events’ — the breaking off 
of icebergs from Greenland 
glaciers on a massive scale. The 
team suggests that fresh water 
flooding into the Atlantic from 
the thawing icebergs helped to 
cool the Atlantic region, which 
contributed to the slowing 
down of ocean circulation. This 
led to increased rainfall in the 
tropics, where wetlands grew 
and produced more methane. 

The climatic impact of some 
Heinrich events lasted for 
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says a team led by Sigurjon Jonsson of the King 
Abdullah University of Science and Technology 
in Thuwal, Saudia Arabia. The area is seeing a 
decades-long episode of rifting, in which one 
plate of Earth's crust pulls apart from another. 

Observing newly formed islands in such detail 
is rare, and the islands will probably remain 
above water despite erosion, say the authors. 
Nature Commun. 6, 7104 (2015) 


up to 1,500 years, suggesting 
that Atlantic circulation was 
weakened for much longer 
than the thaw periods. 
Science 348, 1016-1019 (2015) 


Antifungal drug 
dodges resistance 


A yeast-killing compound 
evades drug resistance and is 
less toxic than a related drug 
used in the clinic. 

The antifungal drug 
amphotericin B (AmB) does 
not typically result in resistant 
fungi, but it kills human cells so 
can be used only at low doses. 
To create AmB derivatives that 
are less toxic to humans and 
do not cause resistance, Susan 


JONSSON ET AL./NATURE COMMUN. 


DUSTIN KEMP/UNIV. GEORGIA 


Q 
a 
me 
fo} 
oO 
3 
wn 
ud 
oo 
> 
iS) 
a 
ct 
[tty 
1: 
Zz 
= 
= 
<x 
a 
a 
g 
lu 
aq 
I 
a 
xt 
a 
ra 
= 
ud 
is 
Lo} 
. 
ae. 
S 
me 
S 
[ray 
wn 
a 
fo} 
[S) 
=< 
2 
[rey 
=| 
fo} 
a 
& 
E 
EE 
ud 
a 


Lindquist at the Whitehead 
Institute for Biomedical 
Research in Cambridge, 
Massachusetts, Martin Burke 
at the University of Illinois at 
Urbana-Champaign and their 
colleagues used just three steps 
of chemical synthesis. 

The new compounds killed 
infectious yeast in the lab and 
in mice, but were less toxic to 
human cells and mice than 
AmB. Yeast strains that were 
resistant to the compounds 
in vitro were unable to cause 
lethal infections in mice, 
unlike non-resistant strains, 
suggesting that drug-resistant 
strains are less fit. 

The new antifungals kill 
yeast by pulling out ergosterol 
molecules from the yeast cell 
wall, but they do not bind to the 
similar molecule cholesterol in 
animal cell membranes. 

Nature Chem. Biol. http://dx.doi. 
org/10.1038/nchembio.1821 (2015) 


EVOLUTION 


Migration explains 
drab female birds 


Some female warblers lost 
their bright colours just as the 
birds were evolving to become 
migratory, suggesting that this 
behavioural change spurred 
the evolution of sex differences 
in plumage colour. 

To find out why female 
songbirds are often as colourful 
as the males in tropical species 
but less colourful in northern 
ones, Troy Murphy at Trinity 
University in San Antonio, 
Texas, and his colleagues 
studied 108 species of wood 
warblers (Setophaga tigrina; 
female pictured left, male 
pictured right). Migratory 
species tend to live farther 
north, and the authors found 
that the longer the bird’s 
migration, the more distinct 
the sexes look. In multiple 
species, these sex 
differences evolved 
at around the 
same time as the birds 
first began migrating. 


The findings suggest that 
sex differences in colour are 
driven by the needs of females. 
Non-migratory females 
often defend their territories 
using bright colours to signal 
fighting ability. But females 
that migrate rarely act in 
this way, and bright colours 
could make them more visible 
to predators during their 
migration. 

Proc. R. Soc. B 282, 20150375 
(2015) 


| NEUROSCIENCE 
Stroke brain still 
controls device 


Rats can use their brain 
activity to control an external 
device through an implanted 
electrode, even after a stroke. 
The finding suggests that 
people who have motor 
problems as a result ofa stroke 
could one day benefit from 
such brain-machine interfaces. 
Karunesh Ganguly at the 
San Francisco Veterans Affairs 
Medical Center in California 
and his colleagues placed 
electrodes near the part of the 
motor cortex in the rat brain 
that was injured by stroke, and 
then trained the animal to shift 
the angle of a water-feeding 
tube using just its brain activity. 
The team found that stroke- 
affected rats learned this task 
as quickly as control animals, 
even though the stroke 
animals showed only minimal 
improvements in movement. 
The results suggest that the 
brain area injured bya stroke 
can still form new brain-cell 
connections. 
J. Neurosci. 35, 8653-8661 (2015) 


CLIMATE-CHANGE BIOLOGY 


Warming threat to 
ocean biodiversity 


Marine biodiversity could 
undergo drastic changes in 
as much as 70% of the world’s 
oceans if global warming is 
not limited to below 2°C 
by 2100. 
Grégory Beaugrand 
at the CNRS Laboratory 
of Oceanology and 
Geosciences in 


RESEARCH HIGHLIGHTS 


THIS WEEK 


SOCIAL SELECTIO 


Popular topics 
on social media 


How best to respond to reviewers 


Comments from referees reviewing a paper can sometimes 
be less than polite, making it tempting for authors to send 
equally rude replies. But a trio of blog posts emphasizes the 
importance of professional, constructive responses from 
authors (see go.nature.com/yzwvmt; go.nature.com/hzp3bg 
and go.nature.com/hchv3i). The posts, by three ecologists, 
aim to help researchers to avoid common pitfalls that can 
lower their chances of publication. Commenters on Twitter 
appreciated the guidance. Responding to one of the blogs, 
Auriel Fournier, a PhD student at the 
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Wimereux, France, Richard 
Kirby at the University of 
Plymouth, UK, and their 
colleagues modelled how 
patterns of biodiversity across 
the oceans would change 
under different future climate 
scenarios, and compared 
them to patterns over the past 
50 years and during prehistoric 
warm and cold periods. 

With low levels of warming 
(mean temperature rise of 
roughly 1 °C), around 16% of 
the ocean would see increased 
biodiversity through species 
invasions and about 6% of 
oceans would experience a 
decrease. In the most extreme 
warming scenario, of roughly 
3.7 °C, these numbers rise 
to about 32% and 44%. 

Such severe warming could 
produce a greater change 

in marine biodiversity than 
has been seen over the past 

3 million years or so. 

Nature Clim. Change http://dx.doi. 
org/10.1038/nclimate2650 (2015) 


ECOLOGY 


Coral faces algal 
sabotage 


Caribbean coral have been 
invaded by algae that slow their 
growth and may have been 
introduced by humans. 

Tye Pettay and Todd 
LaJeunesse at Pennsylvania 
State University in University 
Park and their colleagues 


University of Arkansas in Fayetteville, 
tweeted: “I’m struggling with this 
right now, this was a very helpful 

and timely post.” 


sampled various coral species 
(Orbicella faveolata; pictured) 
from around the world and 
analysed the genetics of their 
symbiotic algae. They found 
that one alga in the Caribbean, 
Symbiodinium trenchii, 
comprised just a few lineages 
that were closely related to 
those in the Indian and Pacific 
oceans. Corals living with 

this symbiont tolerated high 
temperatures better than those 
without it, but incorporated 
calcium into their skeletons at 
around half the rate. 

The findings indicate that 
this alga invaded the Caribbean 
thanks to human activities, and 
could have negative long-term 
ecological impacts in this 
region. 

Proc. Natl Acad. Sci. USA 
http://dx.doi.org/10.1073/ 
pnas.1502283112 (2015) 


> NATURE.COM 

For the latest research published by 
Nature visit: 
www.nature.com/latestresearch 


4 JUNE 2015 | VOL 522 | NATURE | 9 


© 2015 Macmillan Publishers Limited. All rights reserved 


SEVEN DAYS nescnnss 


Nuclear restart 


Japanese regulators have 
granted the first permit to 
restart a nuclear reactor 
following the Fukushima 
disaster in 2011. Issued 

by the country’s Nuclear 
Regulation Authority on 

27 May, the permit will 
allow the Kyushu Electric 
Power Company in Fukuoka 
to restart two reactors at 

the Sendai Nuclear Power 
Plant, with the first coming 
online as early as July. Japan 
halted its 43 operable nuclear 
reactors in September 2013, 
pending a safety review by 
the regulators. The Japanese 
government is currently 
considering a draft energy 
plan, which projects that 
nuclear power could account 
for up to 22% of the nation’s 
electricity by 2030. 


Reef unlisted 


Australia’s Great Barrier 

Reef has not been put 

on the United Nations’ 

list of heritage sites 

that are in danger — to 

the consternation of 
conservationists. A draft 
document released by the UN 
on 29 May instead advises the 
World Heritage Committee, 
which decides on the list, 

to welcome progress made 

by Australia in protecting 

the reef. It also advises 
regular checks to ensure 


14.9 m 


The number of new cancer 
cases globally in 2013, 
according a study published 
on 28 May. 

Global Burden of Disease Cancer 
Collaboration JAMA Oncol. http://doi. 
org/4w4 (2015). 


Rosetta zooms in on heart of comet 


This craggy landscape might look like an Alpine 
peak, but it is actually the nucleus of comet 
67P/Churyumov-Gerasimenko, the target of 
the European Space Agency's Rosetta spacecraft. 
Almost 1,800 images taken by the probes 
navigation camera from as little as 8 kilometres 
above the comet’s surface were released by the 


that the country’s 35-year 
sustainability plan for the 
reef is working. The reef’s 
status will be finalized at the 
committee’s meeting in Bonn, 
Germany, at the end of this 
month. 


China ivory trade 
Conservationists welcomed 
news that China plans to 
phase out its legal ivory trade. 
In a symbolic gesture, about 
680 kilograms of confiscated 
illegal ivory were destroyed 
in Beijing on 29 May. At the 
event, Zhao Shucong, head 
of the Chinese State Forestry 
Administration, declared 
that China “will strictly 
control ivory processing and 
trade until the commercial 
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processing and sale of 
ivory and its products are 
eventually halted” 


EVENTS 


US anthrax blunder 
The US Department of 
Defense announced on 

27 May that it had accidentally 
shipped live anthrax spores 

to labs in nine US states and 

a US military base in South 
Korea. The facilities that 
received the samples did not 
have systems in place to protect 
employees against anthrax 
exposure because they were 
expecting to receive killed 
spores. It is unclear how many 
people were exposed. Some 
workers are now receiving 
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Rosetta team on 28 May. They were taken just 
before and after the Philae lander touched down 
on the comet in November last year. This image 
captures an area spanning 785 metres across the 
neck of the rubber-duck-shaped comet. Rosetta 
continues to orbit the cometas it heads for its 
closest approach to the Sun in August. 


preventive treatment. The 
incident follows a series of 
biosafety lapses last summer 
at US government agencies. 
See go.nature.com/dc2anv for 
more. 


Antelope die-off 
Almost half of the global 
population of saiga antelopes 
has been killed off in just a few 
weeks. The United Nations 
Environment Programme 
(UNEP) said on 28 May that 
more than 120,000 of the 
critically endangered animals 
(Saiga tatarica) have died 

in the Betpak-Dala area of 
central Kazakhstan. Four 
main populations of saigas 
live in Kazakhstan and Russia, 
and a 2014 census reported 


ESA/ROSETTA/NAVCAM 
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SOURCE: FAO 


that about 262,000 animals 
live worldwide. UNEP says 
that a preliminary analysis 
suggests that both biological 
and environmental factors 
may have caused the die-off. 
See go.nature.com/nkqc81 for 
more. 


AWARDS 


Shaw prizes 

The three US$1-million 
prizes from the Shaw Prize 
Foundation were announced 
in Hong Kong on 1 June. 
William Borucki of NASAs 
Ames Research Center in 
Mountain View, California, 
received the astronomy 

prize for leading the Kepler 
exoplanet-hunting mission. 
Microbiologists Bonnie 
Bassler of Princeton University 
in New Jersey and Peter 
Greenberg of the University 
of Washington in Seattle 
shared the biology prize for 
their discovery of bacterial 
communication, or ‘quorum 
sensing. Gerd Faltings of 

the University of Bonn in 
Germany and Henryk Iwaniec 
of Rutgers University in New 
Jersey shared the mathematics 
prize for their breakthroughs 
in number theory. 


} SS PEGPLE 
Oxford head 


Political scientist Louise 
Richardson (pictured) 
looks set to become the 


TREND WATCH 


The number of undernourished 
people worldwide has fallen to 
795 million people, down from 

1 billion in the early 1990s, 
according to a report by the Food 
and Agriculture Organization of 
the United Nations. Economic 
growth and social policies — 
which promote better nutrition, 
health care and education — have 
helped to beat hunger in many 
developing regions. There has 
been little progress in southern 
Asia and sub-Saharan Africa, 
where political instability has led 
to food insecurity. 


next vice-chancellor of the 
University of Oxford, UK, 
having been nominated 

for the role on 28 May. 
Richardson, whose research 
covers terrorism and security 
issues, will be the first woman 
to head the university in its 
more than 800-year history. If 
her appointment is approved 
by the university's decision- 
making body, known as 
Congregation, Richardson will 
take over from current vice- 
chancellor Andrew Hamilton 
on 1 January 2016. She has 
held leadership positions at 
the University of St Andrews, 
UK, and Harvard University in 
Cambridge, Massachusetts. 


FUNDING 
Funds ring-fenced 


Three basic-research funding 
programmes have been 
sheltered from a controversial 
budget raid on the European 
Union (EU) Horizon 2020 
funding framework. The 
siphoned money — which 


GLOBAL HUNGER 


now stands at €2.2 billion 
(US$ 2.4 billion) instead of 
€2.7 billion — will establish 
the European Fund for 
Strategic Investments. The 
European Parliament, Council 
and Commission agreed 

on 28 May to ring-fence the 
budgets of the European 
Research Council, the Marie 
Sktodowska-Curie research 
fellowship and a programme 
that supports researchers in 
low-income EU countries. The 
agreement must be ratified by 
the European Parliament on 
24 June. 


Hawaii compromise 


At least one-quarter of the 

13 telescopes atop Hawaii's 
sacred mountain Mauna Kea 
must be removed by the time 
the planned Thirty Meter 
Telescope (TMT) begins 
operating there next decade, 
Hawaiian governor David Ige 
said on 26 May. Construction 
of the TMT halted in early 
April when protestors blocked 
the road to the mountain’s 
summit. The governor also 
said that the University of 
Hawaii at Manoa, which 
leases the land on which the 
telescopes sit, must provide 
cultural training for Mauna 
Kea visitors and promise that 
this will be the last telescope to 
be built at the site. See page 15 
and go.nature.com/ufodat for 
more. 


The number of undernourished people has fallen by about 
one-fifth globally, but progress differs between regions. 


Developed 
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SEVEN DAYS | THIS WEEK | 


6-9 JUNE 

The latest developments 
in human and 

medical genetics will 

be discussed at the 
European Human 
Genetics Conference in 
Glasgow, UK. 
go.nature.com/tndjgh 


6-13 JUNE 

Delegates from United 
Nations member states 
meet in Rome for the 
Food and Agricultural 
Organization's annual 
conference. Topics 
include addressing 
poverty, food security 
and climate-change 
impacts. 
go.nature.com/aniorh 


7-11 JUNE 

Scientists gather in 

San Antonio, Texas, 

for the American 
Nuclear Society’s 
annual meeting. This 
year’s theme is ‘Nuclear 
technology: an essential 
part of the solution. 
go.nature.com/Ina2dk 


X-ray upgrade 

The European Synchrotron 
Radiation Facility (ESRF) 

in Grenoble, France, 

began a major upgrade 

on 29 May. Already the 
world’s most intense 

X-ray source, the ESRF 

is investing €150 million 
(US$165 million) ina new 
accelerator that will deliver 
even more tightly focused 
beams, creating X-rays 

that are 100 times brighter. 
Installing the new machine 
in the ESRF’s present tunnels 
will entail a 17-month 
shutdown starting at the end 
of 2018. The upgraded source 
should allow researchers to 
image materials and observe 
chemical reactions at the 
nanometre scale. 
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| NEWS IN FOCUS 


> with Famiglia Cristiana magazine that they 
are looking into theories that the bacterium 
may have been deliberately introduced into 
the area, or became entrenched because 
agricultural scientists failed to monitor the 
region properly, either deliberately or through 
neglect. 

On 12 May, the Italian Association of Scien- 
tific Societies in Agriculture (AISSA), which 
represents 4,000 scientists in Italy, published 
a public letter defending the Puglian scien- 
tists and their work. “The claims do not have 
a scientific basis — that’s what has shocked the 
scientific community,’ says Vincenzo Gerbi, 
AISSA president. 

Puglian scientists have had to contend with 
public criticism, too. Several popular blogs 
devoted to the Xylella emergency have cast 
doubt on scientists’ ways of working and their 
results — saying, for example, that a cure exists 
but is being suppressed. And Peacelink, an Ital- 
ian non-governmental organization, wrote to 
the EU health commissioner in March say- 
ing that Xylella had not been proved to be the 
source of the outbreak, and that the deaths 
were instead the result of a fungus that could be 


Donato Boscia researches Xylella fastidiosa at 
Italy’s Institute for Sustainable Plant Protection. 


eliminated without destroying trees. An expert 
panel of the European Food Safety Authority 
debunked these suggestions in a report pub- 
lished in April. “It’s frustrating to hear all these 
complaints when you think you are doing a 


public service, says Anna Maria D’Onghia, 
head of the pest-management division at the 
IAMB, who has been questioned by police. 
“We are always being attacked for doing too 
little, or the wrong things.” 

Boscia says that the “attempts to delegiti- 
mize the results of scientific research” have 
been worse than the police investigations. But 
it is not all bad news for Puglian scientists. On 
27 May, the regional government announced a 
€2-million (US$2.2-million) fund for projects 
that might aid the diagnosis, epidemiology 
and monitoring of the bacterium. It said that a 
‘containment area in the province of Lecce — 
where the bacterium is now endemic, making 
complete eradication impossible — will be used 
as an open-air Xylella laboratory. National and 
European research agencies have also promised. 
money, says Boscia. “The outdoor laboratory 
would be perfect for all of us — and also allow 
critics to put their own theories to the test? = 


1. Saponari, M., Boscia, D., Nigro, F. & Martelli, G. P. 
J. Plant Pathol. http://dx.doi.org/10.4454/JPP. 
V9513.035 (2013). 

2. Elbeaino, T. et al. Phytopathol. Mediterr. 53, 328-232 
(2014). 
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POLITICAL SCIENCE 


Retracted gay-marriage study 
debated at misconduct meet-up 


Over rum cocktails at the World Conference on Research Integrity, experts discussed what 
can be learnt from the fallout of a flawed political-science paper. 


BY RICHARD VAN NOORDEN, RIO DE JANEIRO 


he world’s largest gathering of specialists 
| in research misconduct kicked off on 
31 Mayin Rio de Janeiro, Brazil, shortly 
after science’s latest scandal broke. On the 
evening before the start of sessions on how to 
diagnose and remedy ethical faults in research, 
delegates to the 4th World Conference on 
Research Integrity sipped caipirinhas, Brazil's 
national cocktail — and swapped views on 
what could be gleaned from a flawed political- 
science study. 

The paper in question, which claimed to 
show that short conversations with a canvasser 
who is gay could encourage voters to support 
same-sex marriage, made headlines across the 
world when it was published in Science last 
December (M. J. LaCour and D. P. Green Sci- 
ence 346, 1366-1369; 2014) — and again when 
it was retracted last week (Science http://doi. 
org/4zt; 2015). “The case is very much on our 
minds,” said Melissa Anderson, a co-organizer 
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of the meeting who studies scientific integrity 
at the University of Minnesota in Minneapolis. 

Although the case throws up new instances 
of misconduct, and of inadequate supervision 
by senior academics, delegates to the Rio con- 
ference felt that, in general, the case illuminated 
little about the academic system that a steady 
drip-drip of research misconduct has not already 
highlighted. The main challenge, said Brian 
Martinson, a social scientist at the HealthPart- 
ners Institute for Education and Research in 
Minneapolis, is how to create a supportive envi- 
ronment that incentivizes reliable, reproducible 
research. “A lot of people think the bad stuff in 
science comes from academics being greedy or 
narcissistic — but that ignores how the structural 
arrangements in science, like the decline of fund- 
ing and stable academic positions in the United 
States, leads people into bad behaviour,’ he said. 

In the latest twist in the debacle, co-author 
Michael LaCour, a graduate student in political 
science at the University of California, Los Ange- 
les (UCLA), has admitted to misrepresenting 


© 2015 Macmillan Publishers Limited. All rights reserved 


his funding sources and the incentives he used 
to attract people to take part in the study. Ina 
29 May online reply to researchers who had 
spotted irregularities in his survey data (see 
go.nature.com/acpxnh), LaCour said that he 
had deleted his raw data for reasons of confi- 


dentiality and admit- 
“Academia ted that he did not get 
should be ethical approval from 
concerned an institutional review 
that its system board before he did 
of checks and the work, or before he 
balances has submitted it to Science. 
problems.” The document did not 


include convincing 

evidence that he had conducted the surveys. 
LaCour told The New York Times that he 
stands by his finding — but his co-author 
Donald Green, a political scientist at the Uni- 
versity of Columbia in New York City, does not: 
Green requested the paper’s retraction after 
three outside scientists told him about irregu- 
larities in its survey data, and he apologized for 


not adequately supervising LaCour’s work. 

Delegates in Rio broadly agreed that the 
case highlights the need for better super- 
vision by senior academics. “Academia 
should be concerned that its system of checks 
and balances has problems,” said Nicholas 
Steneck, who studies research integrity at 
the University of Michigan in Ann Arbor. 
“Tt will never be perfect, but it is far from 
perfect now.’ Sabine Kleinert, a co-organ- 
izer of the research-integrity conference and 
senior executive editor at The Lancet, said: 
“The wider lessons are still the same as many 
of these cases throw up — that of the role of 
the co-authors in taking steps to be account- 
able for the data, and the role of institutions 
in safeguarding or having repositories for the 
data underlying research that is done there.” 

On the plus side, the retraction came 
swiftly after queries were raised about the 
data, noted Ivan Oransky, a journalist who 
runs the blog Retraction Watch, which first 
reported that Green had asked for the study to 
be retracted. Researchers posted their objec- 
tions online on 19 May (see go.nature.com/ 
qgrdav) and Science retracted the study on 
28 May. That is in stark contrast to an earlier 
misconduct case — involving the cancer 
geneticist Anil Potti — in which whistle- 
blowers tried for years to quietly raise con- 
cerns with Potti’s institution, Duke University 
in Durham, North Carolina, before papers 
were finally retracted and Potti resigned. 

Mysteries still linger in the LaCour 
case. In the 23-page reply that he posted 
on 29 May, LaCour raises statistical objec- 
tions to the criticisms levelled at him. These 
“couldnt possibly be more beside the point’, 
said Jelte Wicherts, a statistician at Tilburg 
University in the Netherlands. 

LaCour also posted snapshots of an appar- 
ent survey set up with the firm Qualtrics, but 
these actually relate to a pilot study that was 
abandoned, according to Chris Skovron, 
a political scientist at the University of 
Michigan. He had worked on the study until 
LaCour cut off the collaboration, he says. 

As to whether canvassing changes voters’ 
attitudes, Brian Calfano, a political scien- 
tist at Missouri State University in Spring- 
field, says that other literature suggests that 
it can, but that replication or extension of 
the LaCour-Green work would have to be 
done to know for certain that it does so in 
this particular scenario. LaCour wrote in his 
29 May document that Calfano had repli- 
cated his study, but Calfano says that his own 
work is only a preliminary finding relating to 
a different kind of canvassing of voters. He 
shared the finding with LaCour at an early 
stage, but is not willing to stand behind it 
until further tests are completed. 

LaCour did not respond to a request for 
comment. His graduate supervisor, political 
scientist Lynn Vavreck, says that UCLA has 
an ongoing inquiry into the issue. m 


The Gemini North telescope is one of several world-class astronomy facilities on Mauna Kea. 


ASTRONOMY 


Hawaiian 
telescopes pruned 


Cultural fight over sacred mountain Mauna Kea prompts 


rule change. 


BY ALEXANDRA WITZE 


he quest to build one of the world’s 
ik telescopes has radically 

reshaped the future of a Hawaiian 
mountain. On 26 May, Hawaii governor David 
Ige announced that the controversial Thirty 
Meter Telescope (TMT) could be built on 
Mauna Kea as planned — but that three or four 
of the mountain’s 13 existing telescopes must 
be dismantled over the next decade. 

Mauna Kea is home to such world-leading 
facilities as the twin 10-metre Keck telescopes 
and the 8-metre-class Subaru and Gemini 
North telescopes (see ‘Starry summit’). Spec- 
ulation is already running high about which 
telescopes will be removed, and when. 

Native Hawaiians regard Mauna Kea as 
sacred, and they view building the TMT as 
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another violation of an already desecrated 
site. Construction was to have begun in early 
April, but was put on hold when protests broke 
out on the mountain, in Honolulu and at other 
sites across the islands. 

Ige’s announcement, a direct response to 
the unrest, accelerates long-standing plans to 
decommission Mauna Kea telescopes as they 
grow older. “The idea of removing telescopes 
from the summit is not a new one,’ says Doug 
Simons, director of the Canada-France-Hawaii 
Telescope on Mauna Kea. “It’s the natural evo- 
lution of a set of observatories that are ageing 
in a lot of ways.” 

The governor has ordered the University 
of Hawaii at Manoa, which leases the moun- 
tain top as a science reserve, to close 25% 
of the observatories there before the TMT 
begins operation in the mid-2020s. The 
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not adequately supervising LaCour’s work. 

Delegates in Rio broadly agreed that the 
case highlights the need for better super- 
vision by senior academics. “Academia 
should be concerned that its system of checks 
and balances has problems,” said Nicholas 
Steneck, who studies research integrity at 
the University of Michigan in Ann Arbor. 
“Tt will never be perfect, but it is far from 
perfect now.’ Sabine Kleinert, a co-organ- 
izer of the research-integrity conference and 
senior executive editor at The Lancet, said: 
“The wider lessons are still the same as many 
of these cases throw up — that of the role of 
the co-authors in taking steps to be account- 
able for the data, and the role of institutions 
in safeguarding or having repositories for the 
data underlying research that is done there.” 

On the plus side, the retraction came 
swiftly after queries were raised about the 
data, noted Ivan Oransky, a journalist who 
runs the blog Retraction Watch, which first 
reported that Green had asked for the study to 
be retracted. Researchers posted their objec- 
tions online on 19 May (see go.nature.com/ 
qgrdav) and Science retracted the study on 
28 May. That is in stark contrast to an earlier 
misconduct case — involving the cancer 
geneticist Anil Potti — in which whistle- 
blowers tried for years to quietly raise con- 
cerns with Potti’s institution, Duke University 
in Durham, North Carolina, before papers 
were finally retracted and Potti resigned. 

Mysteries still linger in the LaCour 
case. In the 23-page reply that he posted 
on 29 May, LaCour raises statistical objec- 
tions to the criticisms levelled at him. These 
“couldnt possibly be more beside the point’, 
said Jelte Wicherts, a statistician at Tilburg 
University in the Netherlands. 

LaCour also posted snapshots of an appar- 
ent survey set up with the firm Qualtrics, but 
these actually relate to a pilot study that was 
abandoned, according to Chris Skovron, 
a political scientist at the University of 
Michigan. He had worked on the study until 
LaCour cut off the collaboration, he says. 

As to whether canvassing changes voters’ 
attitudes, Brian Calfano, a political scien- 
tist at Missouri State University in Spring- 
field, says that other literature suggests that 
it can, but that replication or extension of 
the LaCour-Green work would have to be 
done to know for certain that it does so in 
this particular scenario. LaCour wrote in his 
29 May document that Calfano had repli- 
cated his study, but Calfano says that his own 
work is only a preliminary finding relating to 
a different kind of canvassing of voters. He 
shared the finding with LaCour at an early 
stage, but is not willing to stand behind it 
until further tests are completed. 

LaCour did not respond to a request for 
comment. His graduate supervisor, political 
scientist Lynn Vavreck, says that UCLA has 
an ongoing inquiry into the issue. m 


The Gemini North telescope is one of several world-class astronomy facilities on Mauna Kea. 


ASTRONOMY 


Hawaiian 
telescopes pruned 


Cultural fight over sacred mountain Mauna Kea prompts 


rule change. 


BY ALEXANDRA WITZE 


he quest to build one of the world’s 
ik telescopes has radically 

reshaped the future of a Hawaiian 
mountain. On 26 May, Hawaii governor David 
Ige announced that the controversial Thirty 
Meter Telescope (TMT) could be built on 
Mauna Kea as planned — but that three or four 
of the mountain’s 13 existing telescopes must 
be dismantled over the next decade. 

Mauna Kea is home to such world-leading 
facilities as the twin 10-metre Keck telescopes 
and the 8-metre-class Subaru and Gemini 
North telescopes (see ‘Starry summit’). Spec- 
ulation is already running high about which 
telescopes will be removed, and when. 

Native Hawaiians regard Mauna Kea as 
sacred, and they view building the TMT as 


© 2015 Macmillan Publishers Limited. All rights reserved 


another violation of an already desecrated 
site. Construction was to have begun in early 
April, but was put on hold when protests broke 
out on the mountain, in Honolulu and at other 
sites across the islands. 

Ige’s announcement, a direct response to 
the unrest, accelerates long-standing plans to 
decommission Mauna Kea telescopes as they 
grow older. “The idea of removing telescopes 
from the summit is not a new one,’ says Doug 
Simons, director of the Canada-France-Hawaii 
Telescope on Mauna Kea. “It’s the natural evo- 
lution of a set of observatories that are ageing 
in a lot of ways.” 

The governor has ordered the University 
of Hawaii at Manoa, which leases the moun- 
tain top as a science reserve, to close 25% 
of the observatories there before the TMT 
begins operation in the mid-2020s. The 
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STARRY SUMMIT 


Thirty Meter 
Telescope 


Mauna Kea, a mountain on the Big Island of Hawaii, is a hotbed of 
astronomy. Since the 1960s, a series of observatories have sought to 
take advantage of the site’s high altitude and clear, dark skies. 


Decade of opening 
_ 1960s 1970s 1980s 1990s M2000s Mi Under construction 


MAUNA KEA 
VOLCANO 


> university owns a 2.2-metre optical tel- 
escope that is the oldest on Mauna Kea, dat- 
ing back to 1970; a 0.9-metre educational 
optical telescope; and the 3.8-metre United 
Kingdom Infrared Telescope (UKIRT). It 
also manages the 3-metre Infrared Telescope 
Facility for NASA, which studies planets, 
asteroids and stars. 

“We have always made the point that space 
on the top of the mountain should only be 
populated by the absolutely best telescopes,” 
says Giinter Hasinger, director of the univer- 
sity’s Institute for Astronomy. 


EYES SHUT 

The first to go will be the Caltech Submilli- 
meter Observatory, the closure of which was 
announced in 2009. It will end operations 


100 metres 


in September, and then will be dismantled. 
Other telescopes, including Keck, Gemini and 
Subaru, involve complex international agree- 
ments that cannot be overwritten by the state 
of Hawaii alone. All have committed to operat- 
ing on the mountain to the end of 2033. 

“We intend to continue operating until 
we come to a point where the science return 
isn’t worth it,” says Raymond Blundell, an 
astronomer at the Harvard-Smithsonian 
Center for Astrophysics in Cambridge, 
Massachusetts, and director of the Sub- 
millimeter Array, an eight-dish radio tele- 
scope array on Mauna Kea. 

Some of the telescopes on the mountain 
have just begun a new lease of life. Earlier this 
year, a consortium of east Asian observato- 
ries took over the submillimetre-wavelength 


James Clerk Maxwell Telescope to study how 
galaxies and stars form, among other things. 
And UKIRT has just begun a long-term sci- 
ence programme that involves studying space 
debris and near-Earth asteroids, says director 
Richard Green, an astronomer at the Univer- 
sity of Arizona in Tucson. 

For now, Green continues to plan for 
nearly two decades ahead — although he 
acknowledges that the situation may change. 
“We realize there has to be more attention 
paid to the culture and how the mountain is 
taken care of; he says. 

In addition to closing telescopes, Ige lev- 
ied a list of other requirements. When the 
University of Hawaii's lease ends in 2033, it 
must return to state protection more than 
40 square kilometres of the 45 it leases. 
Visitors to the summit must receive cultural 
training. And the TMT location, which is a 
few hundred metres beneath the actual sum- 
mit, will be the last area on Mauna Kea on 
which any telescope will ever be built. 

Nearly every telescope project on Mauna 
Kea in recent years has faced local protests, 
although not the sustained high emotion 
inspired by the TMT. John Johnson, an astron- 
omer at Harvard University in Cambridge, 
says that astronomers should not be on the 
mountain top at all, given the history of the 
Hawaiian Islands. “This goes way beyond 
whether we construct this telescope or not,” 
he says. “It has to do with the fact that the 
United States stole Hawaii from a sovereign 
people and proceeded to systematically erase 
that culture.” 

The university says it will have a plan 
for removing 25% of the observatories 
by the end of this year. The TMT has not 
announced whether and when it will resume 
construction, and legal challenges to the 
project are still wending their way through 
Hawaiian courts. 

Two competing next-generation telescopes 
are being planned for Chile. = 
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Atomic clocks face off 


Next generation of hyper-precise timekeepers can only be tested against each other. 


BY ELIZABETH GIBNEY 


appy birthday, caesium clock. Now 
H move over. As the atomic clock used 
to define time itself turns 60, tests are 
set to begin ona new generation of clocks that 
are designed to give the caesium version arun 


for its money. 
Such timekeepers would enable a variety of 
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experiments, including testing whether the 
fundamental constants of nature really are 
constant over time, and, eventually, a more 
precise official definition of the second. 
Atomic clocks track the frequency of 
electromagnetic waves emitted by atoms as 
they change energy states. First demonstrated 
by British physicist Louis Essen in June 1955, 
the caesium clock became the world’s official 
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timekeeper in 1967 — defining the second 
as the time it takes for the microwaves that 
are absorbed or emitted when caesium 
atoms switch between states to cycle through 
9,192,631,770 oscillations. 

Over the past decade, various laboratories 
have created prototype optical atomic clocks, 
which use different elements such as stron- 
tium and ytterbium that emit and absorb 
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experiments, including testing whether the 
fundamental constants of nature really are 
constant over time, and, eventually, a more 
precise official definition of the second. 
Atomic clocks track the frequency of 
electromagnetic waves emitted by atoms as 
they change energy states. First demonstrated 
by British physicist Louis Essen in June 1955, 
the caesium clock became the world’s official 
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timekeeper in 1967 — defining the second 
as the time it takes for the microwaves that 
are absorbed or emitted when caesium 
atoms switch between states to cycle through 
9,192,631,770 oscillations. 

Over the past decade, various laboratories 
have created prototype optical atomic clocks, 
which use different elements such as stron- 
tium and ytterbium that emit and absorb 
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higher-frequency photons in the visible 
spectrum. This finer slicing of time should, 
in principle, make them more accurate: it is 
claimed that the best of these clocks gain or 
lose no more than one second every 15 billion 
years (10'* seconds) — longer than the current 
age of the Universe — making them 100 times 
more precise than their caesium counterparts. 
Optical clocks are claimed to be the best time- 
keepers in existence, but the only way to verify 
this in practice is to compare different models 
against each other and see whether they agree. 

Starting on 4 June, four European labo- 
ratories will kick off this testing process — 
the National Physical Laboratory (NPL) in 
Teddington, UK; the department of Time- 
Space Reference Systems at the Paris Observa- 
tory; the German National Metrology Institute 
(PTB) in Braunschweig, Germany; and Italy’s 
National Institute of Metrology Research in 
Turin. Between them, the labs host a variety of 
optical clocks that harness different elements 
in different experimental set-ups. 

For the first test, each institute will transmit 
a signal related to the optical frequency of its 
clocks to a satellite, which will beam the fre- 
quencies back down to the other labs. This will 
allow the labs to compare the frequencies of 
light emitted by their clocks and thus measure 
whether they all keep time to the same beat. 

“Tt’s really exciting, says Andrew Ludlow, a 
physicist at an optical-clock powerhouse run 
by the US National Institute of Standards and 
Technology (NIST) in Boulder, Colorado, 
who is not involved in the project. “A couple 
of comparisons of optical clocks have been 
made before, but on nothing like this scale” 
With more clocks, it should be easier to root 
out the source of any discrepancies, adds Helen 
Margolis, a physicist at the NPL. 

She notes that a higher frequency does 
not necessarily mean a more accurate clock, 
because varying sensitivities to environmental 
factors can affect the ability of different clocks 
to keep time in practice. The hope is that all 
the clocks will agree, suggesting that they are 
as precise as claimed. If some clocks do not, 
it will indicate that improvements are needed. 

The initial test is only a prelude to a more 
accurate test, however, because it has one big 
limitation: to beam light to a satellite, it must 
be converted to a microwave frequency — 
which means that much of the potential extra 
accuracy gained by using visible light is lost. 
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By increasing the rate of data transfer, the 
European labs hope to improve the accuracy of 
current state-of-the-art satellite comparisons 
by ten, but it will still be limited to one part in 
10'°. So the main function of the satellite test is 
to build confidence in optical clocks and show 
that they perform at least as well as existing 
caesium clocks, say researchers. 

The more accurate test will transmit signals in 
the visible spectrum 


“We all think through fibre-optic 
our clocks cables to the labs. This 
have avery : will allow the clocks to 
good potential be compared with an 
for achieving accuracy similar to 
the highest the expected accu- 
accuracy.” racies of the clocks 


themselves. Some of 
the labs have already established such links, 
and tests have begun on sections between Paris 
and Teddington, and Paris and Braunsch- 
weig. “Eventually, this would allow a four-way 
comparison. That's the vision,’ says Margolis. 
“There is friendly competition,” she adds. 
“We all think our clocks have a very good 
potential for achieving the highest accuracy 
or we wouldn't be working on them.” 
Fibre-optic links between optical atomic 


A strontium-ion optical clock housed at the National Physical Laboratory in Teddington, UK. 
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clocks already exist elsewhere, such as between 
the NIST lab and its partner lab JILA, also in 
Boulder. But these span shorter distances than 
the European network and are mostly between 
just two labs. “Europe is in a unique position 
as it has a high density of the best clocks in the 
world,’ says Fritz Riehle, a physicist at PTB. 

Even if the clocks pass this later test, usurp- 
ing the caesium clock to create a more precise 
definition of the second will not be easy. Inter- 
national atomic time — on which coordinated 
universal time, or UTC, is based — is currently 
calculated by averaging measurements from 
hundreds of atomic clocks. Doing the same 
with optical atomic clocks would require 
a way to aggregate time at this precise level; 
using the fibre-optic method across oceans is 
not currently feasible. 

In the meantime, ever more precise time is 
important for improving global positioning 
systems, high-resolution radio astronomy and. 
the time-stamping of financial transactions, as 
well as spotting tiny variations in fundamen- 
tal constants. “Most attempts to unify gravity 
with other forces would lead to variations of 
fundamental constants in the expanding uni- 
verse; says Marianna Safronova, a theorist at 
the University of Delaware in Newark. = 
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Joanne Liu visiting an MSF trauma centre in Kunduz, Afghanistan. 


MSF takes bigger 
global-health role 


Relief agency sees mission expanding after Ebola outbreak. 


BY ERIKA CHECK HAYDEN, 
GENEVA, SWITZERLAND 


Frontieres (MSF), is not overly concerned 

with diplomacy. Participating in a panelin 
Geneva, Switzerland, on 20 May with officials 
from the United Nations, the World Health 
Organization (WHO), Liberia and Sierra 
Leone, she propped her head on her hand, 
stared into space and rolled her eyes during 
another speaker's remarks. When she spoke, 
she excoriated the world for leaving West 
Africa vulnerable to the largest Ebola epidemic 
in history. “We're failing, guys,” she said. 

Few would contest Liu’s right to make that 
assertion. MSF (also known as Doctors With- 
out Borders) was the organization that alerted 
the world to the scale of the Ebola epidemic. 
Its speedy response has both reinforced its role 
as the world’s caregiver in health crises and 
catapulted it to new prominence in the inter- 
national health community. In the past year, 


Jin Liu, president of Médecins Sans 
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Liu has addressed the UN General Assembly 
and met with world leaders. Donations to her 
non-governmental organization (NGO) rose 
to €1.14 billion (US$1.24 billion) last year; 
in the United States, donations climbed by 
50% from the previous year. “It’s a defining 
moment,’ Liu told Nature during an interview 
at MSF’s headquarters in Geneva. “We have a 
voice that we have never had before; we need 
to use that very smartly” 

At a time when the WHO is lacking the 
funds and authority to address pressing global 
health needs, there is room for an organiza- 
tion such as MSF to take a greater role in both 
chronic and acute medical crises, as well as 
in research that enhances preparedness for 
those situations. But Liu insists that MSF can- 
not become “the world’s doctor”. “We need to 
be careful that we don’t spread ourselves too 
thin,” she says. 

MSF was founded in 1971 by French doctors 
and journalists who decried a Red Cross edict 
not to speak out about the conditions they saw 
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while treating victims of the Nigerian Civil 
War in the secessionist state of Biafra. Since 
then, the organization has provided medi- 
cal services to people affected by wars, natu- 
ral disasters, famines and infectious-disease 
outbreaks around the world. In 1999, it was 
awarded the Nobel Peace Prize for “pioneer- 
ing humanitarian work on several continents”. 

It was the first international NGO to send 
staff to Guinea when Ebola emerged there in 
March 2014, and its declaration that month 
that the outbreak was “unprecedented” has 
proved tragically correct. Since then, MSF has 
deployed more than 1,300 international staff 
and 4,000 local people to fight Ebola in Guinea, 
Sierra Leone and Liberia. 


WEALTH OF EXPERIENCE 

MSF has fought Ebola outbreaks in nine 
countries, but it took a leadership role in the 
latest epidemic in ways that it had not before. It 
taught staff from other organizations — includ- 
ing the WHO and the US Centers for Disease 
Control and Prevention — how to treat people 
with Ebola. It distributed home disinfection 
kits to hundreds of thousands of people in 
Monrovia and other communities, and deliv- 
ered incinerators to dispose of bodies when 
Liberian burial teams could not keep pace. 

Independence — one of MSF’s core 
principles — allows the organization to move 
faster than governmental and inter-govern- 
mental organizations, but it has also caused 
tension. In July, for instance, MSF forbade 
Michael Gbakie, a disease-surveillance officer 
at Kenema Government Hospital in Sierra 
Leone, from visiting four of his colleagues who 
were being cared for in a nearby MSF Ebola 
treatment centre, even though he had a dec- 
ade of experience working around people with 
similar diseases. “They have a protocol, and 
they will not just allow everyone to go in there 
if they are not working with them,” Gbakie 
says. MSF eventually relented: Gbakie saw 
one of his colleagues, physician Sheik Humarr 
Khan, on the day he died. 

Health officials from other countries 
affected by the Ebola outbreak alluded to 
these tensions at the MSF-organized event on 
20 May: “I hope this outbreak will allow [you] 
to examine the way you work with your col- 
leagues and governments,’ Miatta Gbanya, 
coordinator of the Liberian Ebola response, 
told Liu. 

Liu acknowledges that the organization 
could have communicated better with local 
leaders and communities. “You need to get the 
community on board. This is something that 
we underestimated,” she says. 

The organization has undergone difficult 
transitions in its mission before. When it 
began treating people with HIV in the 1990s, 
it had to shift its approach from emergency 
medicine to delivering chronic care. Much 
more recently, it has found itself managing 
diabetes and other diseases found more in 
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middle-income nations in its treatment of 
refugees from Syria's civil war. 

The Ebola crisis has not only strengthened 
MSF’s patient-care role, but also boosted its 
involvement in research. The NGO used pro- 
ceeds from its Nobel prize to help found the 
Drugs for Neglected Diseases initiative (see 
Nature 505, 142; 2014), which funds drug- 
development work on diseases that mainly 
affect poor people, and to start its Access 
Campaign, which pushes both to increase the 
availability of drugs and for the development of 
lower-cost medicines to treat illnesses in poor 
countries. 

In West Africa, MSF is running clinical trials 
of potential Ebola treatments, pushing for 
more research into the disease and contemplat- 
ing the creation ofa biobank of patient samples 
along with the WHO and other organizations. 


EXPANDED ROLE 

The temptation is for MSF to step in to filla 
void left by retreating funding and authority 
at the WHO and other international health 
organizations. Although WHO member states 
approved some measures to strengthen the 
organization's outbreak response during last 
month's World Health Assembly (see page 5), 
observers say that these measures will not 
address the core problems that slowed its 
response to Ebola in West Africa. 


“The Assembly failed utterly in addressing 
the underlying deficiencies,” says Lawrence 
Gostin, director of the WHO-affiliated Centers 
for Law and the Public’s Health at Georgetown 
University in Washington DC. “There’ a great 
yearning on the part of WHO to be the global 
health coordinator, but the future of WHO’s 

leadership in this 


“The Assembly area is very much in 
failed utterly doubt.’ 
inaddressin g The UN secretary- 
the underlying general and the World 
° oD Bank are examining 
deficiencies. the WHO’s mandate 


in responding to 
health emergencies; the World Bank has out- 
lined details for a Pandemic Emergency Facility 
that would fund early responses to outbreaks. 

But such proposals have tended to focus on 
preventing the spread of outbreaks from poor 
to rich nations, says epidemiologist David 
Heymann at the London School of Hygiene 
and Tropical Medicine. “The paradigm is, the 
donors will be happy to jump up and provide 
funding for a rapid outbreak response, but 
they’re not so ready to provide funding for 
health-systems strengthening,’ he says. 

MSF finds itself increasingly enlisted to 
clean up the local and regional health emer- 
gencies that result from inadequate infrastruc- 
ture. A whiteboard at its international office 
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in Geneva tracks staff deployments to crises 
in some of the 70 or so countries where it is 
currently working, including Ukraine, Iraq, 
the Democratic Republic of the Congo and 
South Sudan. 

To help the world to prepare for the next 
time that one of those local or regional situ- 
ations erupts into an international crisis, Liu 
is convening an open discussion in Dakar, 
Senegal, this month to which she plans to invite 
all those involved in the Ebola outbreak. It is 
one of several post-Ebola discussions under 
way on the global health response; others are 
being organized by the US Institute of Medi- 
cine, Harvard University in Cambridge, Mas- 
sachusetts, and the London School of Hygiene 
and Tropical Medicine, the World Bank and 
the UN. 

Many of these discussions are focusing on 
matters of international law, such as deficien- 
cies in the International Health Regulations, 
which are supposed to govern countries’ 
behaviour in health emergencies. Liu, who 
considers these questions less important than 
the practical concern of how to get treatment 
to people who need it, nonetheless says that she 
is looking forward to the upcoming MSF event. 

“We're going to get a lot of people who 
haven't treated a patient who are now the world 
experts, and who are going to give us lessons,” 
Liu says. “We can only smile at this.” m 
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THE DISRUPTOR 


BY HEIDI LEDFORD 


A powerful gene-editing technology is the biggest 
game changer to hit biology since PCR. But with its 
huge potential come pressing concerns. 


him change the course of his lab. 

Conklin, a geneticist at the Gladstone Institutes in San Francisco, 
California, had been trying to work out how variations in DNA affect vari- 
ous human diseases, but his tools were cumbersome. When he worked 
with cells from patients, it was hard to know which sequences were impor- 
tant for disease and which were just background noise. And engineering a 
mutation into cells was expensive and laborious work. “It was a student's 
entire thesis to change one gene,’ he says. 

Then, in 2012, he read about a newly published technique’ called 
CRISPR that would allow researchers to quickly change the DNA of nearly 
any organism — including humans. Soon after, Conklin abandoned his 
previous approach to modelling disease and adopted this new one. His lab 


I | 1 hree years ago, Bruce Conklin came across a method that made 
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is now feverishly altering genes associated with various heart conditions. 
“CRISPR is turning everything on its head,’ he says. 

The sentiment is widely shared: CRISPR is causing a major upheaval 
in biomedical research. Unlike other gene-editing methods, it is cheap, 
quick and easy to use, and it has swept through labs around the world as 
aresult. Researchers hope to use it to adjust human genes to eliminate dis- 
eases, create hardier plants, wipe out pathogens and much more besides. 
“Tve seen two huge developments since I’ve been in science: CRISPR and 
PCR,’ says John Schimenti, a geneticist at Cornell University in Ithaca, 
New York. Like PCR, the gene-amplification method that revolutionized 
genetic engineering after its invention in 1985, “CRISPR is impacting the 
life sciences in so many ways, he says. 

But although CRISPR has much to offer, some scientists are worried 
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that the field’s breakneck pace leaves little time for addressing the ethi- 
cal and safety concerns such experiments can raise. The problem was 
thrust into the spotlight in April, when news broke that scientists had used 
CRISPR to engineer human embryos (see Nature 520, 593-595; 2015). 
The embryos they used were unable to result ina live birth, but the report” 
has generated heated debate over whether and how CRISPR should be 
used to make heritable changes to the human genome. And there are other 
concerns. Some scientists want to see more studies that probe whether 
the technique generates stray and potentially risky genome edits; others 
worry that edited organisms could disrupt entire ecosystems. “This power 
is so easily accessible by labs — you don't need a very expensive piece of 
equipment and people don't need to get many years of training to do this,’ 
says Stanley Qi, a systems biologist at Stanford University in California. 
“We should think carefully about how we are going to use that power” 


RESEARCH REVOLUTION 

Biologists have long been able to edit genomes with molecular tools. 
About ten years ago, they became excited by enzymes called zinc finger 
nucleases that promised to do this accurately and efficiently. But zinc 
fingers, which cost US$5,000 or more to order, were not widely adopted 
because they are difficult to engineer and expensive, says James Haber, a 
molecular biologist at Brandeis University in Waltham, Massachusetts. 
CRISPR works differently: it relies on an enzyme called Cas9 that uses a 
guide RNA molecule to home in on its target DNA, then edits the DNA 
to disrupt genes or insert desired sequences. Researchers often need to 
order only the RNA fragment; the other components can be bought off 
the shelf. Total cost: as little as $30. “That effectively democratized the 
technology so that everyone is using it,” says Haber. “It’s a huge revolution.” 

CRISPR methodology is quickly eclipsing zinc finger nucleases and 
other editing tools (see “The rise of CRISPR’). For some, that means 
abandoning techniques they had taken years to perfect. “I'm depressed,” 
says Bill Skarnes, a geneticist at the Wellcome Trust Sanger Institute in 
Hinxton, UK, “but I’m also excited.” Skarnes had spent much of his career 
using a technology introduced in the mid-1980s: inserting DNA into 
embryonic stem cells and then using those cells to generate genetically 
modified mice. The technique became a laboratory workhorse, but it was 
also time-consuming and costly. CRISPR takes a fraction of the time, and 
Skarnes adopted the technique two years ago. 

Researchers have traditionally relied heavily on model organisms 
such as mice and fruit flies, partly because they were the only species 
that came with a good tool kit for genetic manipulation. Now CRISPR 
is making it possible to edit genes in many more organisms. In April, for 
example, researchers at the Whitehead Institute for Biomedical Research 
in Cambridge, Massachusetts, reported using CRISPR to study Candida 
albicans, a fungus that is particularly deadly in people with weakened 
immune systems, but had been difficult to genetically manipulate in the 
lab*. Jennifer Doudna, a CRISPR pioneer at the University of California, 
Berkeley, is keeping a list of CRISPR-altered creatures. So far, she has three 
dozen entries, including disease-causing parasites called trypanosomes 
and yeasts used to make biofuels. 

Yet the rapid progress has its drawbacks. “People just don't have the 
time to characterize some of the very basic parameters of the system,” 
says Bo Huang, a biophysicist at the University of California, San Fran- 
cisco. “There is a mentality that as long as it works, we don’t have to 
understand how or why it works.” That means that researchers occasion- 
ally run up against glitches. Huang and his lab struggled for two months 
to adapt CRISPR for use in imaging studies. He suspects that the delay 
would have been shorter had more been known about how to optimize 
the design of guide RNAs, a basic but important nuance. 

By and large, researchers see these gaps as a minor price to pay for a 
powerful technique. But Doudna has begun to have more serious con- 


cerns about safety. 
CRISPR GENE EDITING 


Her worries began ata 
ques yine de wae A Nature collection 
nature.com/crispr 


she saw a postdoc pre- 
sent work in which a 
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virus was engineered to carry the CRISPR components into mice. The 
mice breathed in the virus, allowing the CRISPR system to engineer muta- 
tions and create a model for human lung cancer*. Doudna gota chill; a 
minor mistake in the design of the guide RNA could result in a CRISPR 
that worked in human lungs as well. “It seemed incredibly scary that you 
might have students who were working with such a thing,” she says. “It’s 
important for people to appreciate what this technology can do.” 


“There is a 
mentality that as 
long as it works, 
we don’t have to 
understand how 
or why it works.” 


Andrea Ventura, a cancer researcher at Memorial Sloan Kettering 
Cancer Center in New York and a lead author of the work, says that his 
lab carefully considered the safety implications: the guide sequences were 
designed to target genome regions that were unique to mice, and the virus 
was disabled such that it could not replicate. He agrees that it is important 
to anticipate even remote risks. “The guides are not designed to cut the 
human genome, but you never know,’ he says. “It’s not very likely, but it 
still needs to be considered” 


EDITING OUT DISEASE 

Last year, bioengineer Daniel Anderson of the Massachusetts Institute of 
Technology in Cambridge and his colleagues used CRISPR in mice to cor- 
rect a mutation associated with a human metabolic disease called tyrosi- 
naemia’. It was the first use of CRISPR to fix a disease-causing mutation 
in an adult animal — and an important step towards using the technology 
for gene therapy in humans. 

The idea that CRISPR could accelerate the gene-therapy field is a 
major source of excitement in scientific and biotechnology circles. But as 
well as highlighting the potential, Andersons study showed how far there 
is to go. To deliver the Cas9 enzyme and its guide RNA into the target 
organ, the liver, the team had to pump large volumes of liquid into blood 
vessels — something that is not generally considered feasible in people. 
And the experiments corrected the disease-causing mutation in just 0.4% 
of the cells, which is not enough to have an impact on many diseases. 

Over the past two years, a handful of companies have sprung up to 
develop CRISPR-based gene therapy, and Anderson and others say that 
the first clinical trials of such a treatment could happen in the next one 
or two years. Those first trials will probably be scenarios in which the 
CRISPR components can be injected directly into tissues, such as those 
in the eye, or in which cells can be removed from the body, engineered in 
the lab and then put back. For example, blood-forming stem cells might be 
corrected to treat conditions such as sickle-cell disease or B-thalassaemia. 
It will be a bigger challenge to deliver the enzyme and guide RNA into 
many other tissues, but researchers hope that the technique could one day 
be used to tackle a wider range of genetic diseases. 

Yet many scientists caution that there is much to do before CRISPR 
can be deployed safely and efficiently. Scientists need to increase the 
efficiency of editing, but at the same time make sure that they do not 
introduce changes elsewhere in the genome that have consequences for 
health. “These enzymes will cut in places other than the places you have 
designed them to cut, and that has lots of implications,” says Haber. “If 
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“Tt will be hard to 
detect whether 
something has 
been mutated 
conventionally 
or genetically 
engineered.’ 


you're going to replace somebody’s sickle-cell gene in a stem cell, youre 
going to be asked, “Well, what other damage might you have done at other 
sites in the genome?” 

Keith Joung, who studies gene editing at Massachusetts General Hospi- 
tal in Boston, has been developing methods to hunt down Cas9’s off-target 
cuts. He says that the frequency of such cuts varies widely from cell to 
cell and from one sequence to another: his lab and others have seen off- 
target sites with mutation frequencies ranging from 0.1% to more than 
60%. Even low-frequency events could potentially be dangerous if they 
accelerate a cell's growth and lead to cancer, he says. 

With so many unanswered questions, it is important to keep 
expectations of CRISPR under control, says Katrine Bosley, chief execu- 
tive of Editas, a company in Cambridge, Massachusetts, that is pursuing 
CRISPR-mediated gene therapy. Bosley is a veteran of commercializing 
new technologies, and says that usually the hard part is convincing others 
that an approach will work. “With CRISPR it’s almost the opposite,’ she 
says. “There’s so much excitement and support, but we have to be realistic 
about what it takes to get there” 


CRISPR ON THE FARM 

While Anderson and others are aiming to modify DNA in human cells, 
others are targeting crops and livestock. Before the arrival of gene- 
editing techniques, this was generally done by inserting a gene into 
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the genome at random positions, along with sequences from bacteria, 
viruses or other species that drive expression of the gene. But the pro- 
cess is inefficient, and it has always been fodder for critics who dislike 
the mixing of DNA from different species or worry that the insertion 
could interrupt other genes. What is more, getting genetically modi- 
fied crops approved for use is so complex and expensive that most of 
those that have been modified are large commodity crops such as maize 
(corn) and soya beans. 

With CRISPR, the situation could change: the ease and low cost may 
make genome editing a viable option for smaller, speciality crops, as 
well as animals. In the past few years, researchers have used the method 
to engineer petite pigs and to make disease-resistant wheat and rice. 
They have also made progress towards engineering dehorned cattle, 
disease-resistant goats and vitamin-enriched sweet oranges. Doudna 
anticipates that her list of CRISPR-modified organisms will grow. 
“There's an interesting opportunity to consider doing experiments or 
engineering pathways in plants that are not as important commercially 
but are very interesting from a research perspective — or for home 
vegetable gardens,” she says. 

CRISPR’s ability to precisely edit existing DNA sequences makes 
for more-accurate modifications, but it also makes it more difficult 
for regulators and farmers to identify a modified organism once it 
has been released. “With gene editing, there’s no longer the ability to 
really track engineered products,” says Jennifer Kuzma, who studies 
science policy at North Carolina State University in Raleigh. “It will 
be hard to detect whether something has been mutated conventionally 
or genetically engineered” 

That rings alarm bells for opponents of genetically modified crops, 
and it poses difficult questions for countries trying to work out how to 
regulate gene-edited plants and animals. In the United States, the Food 
and Drug Administration has yet to approve any genetically modified 
animal for human consumption, and it has not yet announced how it 
will handle gene-edited animals. 

Under existing rules, not all crops made by genome editing would 
require regulation by the US Department of Agriculture (see Nature 
500, 389-390; 2013). But in May, the agriculture department began 
to seek input on how it can improve regulation of genetically modi- 
fied crops — a move that many have taken as a sign that the agency is 
re-evaluating its rules in light of technologies such as CRISPR. “The 
window has been cracked,’ says Kuzma. “What goes through the win- 
dow remains to be seen. But the fact that it’s even been cracked is 
pretty exciting.” 


ENGINEERED ECOSYSTEMS 

Beyond the farm, researchers are considering how CRISPR could or 
should be deployed on organisms in the wild. Much of the attention 
has focused on a method called gene drive, which can quickly sweep 
an edited gene through a population. The work is at an early stage, but 
such a technique could be used to wipe out disease-carrying mosquitoes 
or ticks, eliminate invasive plants or eradicate herbicide resistance in 
pigweed, which plagues some US farmers. 

Usually, a genetic change in one organism takes a long time to spread 
through a population. That is because a mutation carried on one ofa 
pair of chromosomes is inherited by only half the offspring. But a gene 
drive allows a mutation made by CRISPR on one chromosome to copy 
itself to its partner in every generation, so that nearly all offspring will 
inherit the change. This means that it will speed through a popula- 
tion exponentially faster than normal (see ‘Gene drive’) — a mutation 
engineered into a mosquito could spread through a large population 
within a season. If that mutation reduced the number of offspring a 
mosquito produced, then the population could be wiped out, along with 
any malaria parasites it is carrying. 

But many researchers are deeply worried that altering an entire 
population, or eliminating it altogether, could have drastic and unknown 
consequences for an ecosystem: it might mean that other pests emerge, 
for example, or it could affect predators higher up the food chain. And > 
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THE RISE OF CRISPR 


DNA sequences called CRISPRs (clustered regularly interspaced short palindromic 
repeats) are part of a bacterial defence system. After researchers showed in 2012 
that CRISPRs could be used to edit genomes, use of the tools quickly spread, as 
reflected by sharp rises in publications, patent applications and funding. 


PUBLICATIONS 


The number of papers about CRISPR has outstripped the numbers mentioning 
the gene-editing technologies known as TALENs and zinc fingers. 
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PATENTS 


In 2014, worldwide patent applications that mention 
CRISPR leapt and a patent battle intensified. 
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FUNDING 


A sharp jump in US National Institutes of Health funding for 
projects involving CRISPR is a harbinger of future advances. 


CRISPR funding in 
2014 lagged behind 
the nearly $160 million 
for iPS-cell research. 
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December 1987 
Researchers find 
CRISPR sequences 

in Escherichia coli, 

but do not characterize 
their function®. 


July 1995 

CRISPR sequences are 
found to be common in 
other microbes?. 


March 2007 
Scientists at food 
company Danisco 
determine that the 
repeats are part of a 
bacterial defence 
against viruses”. 


June 20122 
Researchers report 
that CRISPR can be 
used to perform 
genome editing'. 


January 2013 
CRISPR is used in 
mouse and human cells, 
fuelling rapid uptake of 
the technique by 
researchers!3, 


March 2013 
The University of 
California and others 
file for a patent on 
the findings?. 


April 2014 

MIT and the Broad 
Institute are granted a 
patent on CRISPR gene 
editing, sparking a 
fierce patent battle. 


March 2015 
Report of the first 
CRISPR gene drive, 
which can spread an 
edited gene rapidly 
through a population®. 


April 2015 
Researchers report 
that they have edited 
human embryos with 
CRISPR, triggering an 
ethical debate?. 
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CRISPR gene editing can be used to propagate a genetic 
modification rapidly through generations. It might be used 
to eradicate a population of disease-carrying mosquitoes. 
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GENE-DRIVE INHERITANCE 


The gene-drive system cuts the partner chromosome, then the 
repair process copies the modification to this chromosome. 


Sawa 
Wild-type 
mosquito 

\ 


' Z 
Nb, eee t,t 


a 


Mosquito with eggs sss 
modified gene + 
gene drive. 


tify 
Nearly 100% 
of offspring 
inherit the & 
ge Vy 
modified gene. a 
Pos Xs 
FES 
. 1 
ee al 
a me a 
eX 7S 
i, —- 
Wid sy re Nw re se 
Lies te P hts te TP 
4 oN oe a a S fo ‘\ 
a iS / ue. 
1 I = 1 = 1 
eter —_- «su cetee > Meteo > —- <=» eee 3 


Modified gene sweeps rapidly through population. 
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> researchers are also mindful that a guide RNA could mutate over time 
such that it targets a different part of the genome. This mutation could 
then race through the population, with unpredictable effects. 

“It has to have a fairly high pay-off, because it has a risk of irre- 
versibility — and unintended or hard-to-calculate consequences for 
other species,’ says George Church, a bioengineer at Harvard Medical 
School in Boston. In April 2014, Church and a team of scientists and 
policy experts wrote a commentary in Science® warning researchers 
about the risks and proposing ways to guard against accidental release 
of experimental gene drives. 

At the time, gene drives seemed a distant prospect. But less than 
a year later, developmental biologist Ethan Bier of the University of 
California, San Diego, and his student Valentino Gantz reported that 
they had designed just such a system in fruit flies’. Bier and Gantz had 
used three layers of boxes to contain their flies and adopted lab safety 
measures usually used for malaria-carrying mosquitoes. But they did 
not follow all the guidelines urged by the authors of the commentary, 
such as devising a method to reverse the engineered change. Bier says 
that they were conducting their first proof-of-principle experiments, 
and wanted to know whether the system worked at all before they 
made it more complex. 

For Church and others, this was a clear warning that the democrati- 
zation of genome editing through CRISPR could have unexpected and 
undesirable outcomes. “It is essential that national regulatory authori- 
ties and international organizations get on top of this — really get on 
top of it,” says Kenneth Oye, a political scientist at the Massachusetts 
Institute of Technology and lead author of the Science commentary. 
“We need more action.” The US National Research Council has formed 
a panel to discuss gene drives, and other high-level discussions are 
starting to take place. But Oye is concerned that the science is moving 
at lightning speed, and that regulatory changes may happen only after 
a high-profile gene-drive release. 

The issue is not black and white. Micky Eubanks, an insect ecologist 
at Texas A&M University in College Station, says that the idea of gene 
drives shocked him at first. “My initial gut reaction was ‘Oh my god, this 
is terrible. It’s so scary,” he says. “But when you give it more thought and 
weigh it against the environmental changes that we have already made 
and continue to make, it would be a drop in the ocean.” 

Some researchers see lessons for CRISPR in the arc of other new tech- 
nologies that prompted great excitement, concern and then disappoint- 
ment when teething troubles hit. Medical geneticist James Wilson of the 
University of Pennsylvania in Philadelphia was at the centre of booming 
enthusiasm over gene therapy in the 1990s — only to witness its down- 
fall when a clinical trial went wrong and killed a young man. The field 
went into a tailspin and has only recently begun to recover. The CRISPR 
field is still young, Wilson says, and it could be years before its potential 
is realized. “It’s in the exploration stage. These ideas need to ferment.” 

Then again, Wilson has been bitten by the CRISPR bug. He says that 
he was sceptical of all the promises being made about it until his own lab 
began to play with the technique. “It’s ultimately going to have a role in 
human therapeutics,” he says. “It’s just really spectacular.” m 


Heidi Ledford is a senior reporter for Nature in Cambridge, 
Massachusetts. 
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BILLION-DOLLAR 
BIOTECH 


Moderna Therapeutics has big ambitions and a bankroll to match. 
How a fledgling start-up became one of the most highly valued private drug firms ever. 


ta breakfast meeting two-and-a-half years 

ago, Pascal Soriot, the newly minted chief 

executive of pharmaceutical giant Astra- 

Zeneca, shook hands on the first major 
drug-development deal of his tenure. It was 
a research partnership with little-known bio- 
technology company Moderna Therapeutics 
of Cambridge, Massachusetts. Worth up to 
US$420 million, the deal was unusually large 
for a start-up that offered only a fledgling drug 
technology, especially one that had not yet even 
been tested in humans. 

That was the first of many huge cheques for 
Moderna. This January alone, the company 
announced a record $500 million in financ- 
ing from a handful of investors, pushing it over 
the $1-billion fund-raising mark and making it 
the most highly valued venture-backed private 
company in drug development today. 

“Everybody is talking about this,” says 
Johannes Fruehauf, who runs LabCentral, an 
incubator and shared laboratory facility in the 
bustling Cambridge biotechnology hub known 
as Kendall Square. “It’s inevitable with these 
large, eye-popping numbers.” 

Investors are clearly attracted to Moderna’s 
technology, which aims to use chemically 
modified messenger RNA (mRNA) molecules 
to produce any protein that the body might 
need. Backers have also bought into the reputa- 
tion of the company’s high-profile co-founders 
and its charismatic chief executive, whose bold 
ambition is to move 100 drugs into clinical 
testing within the next decade, treating every- 
thing from cancer to rare genetic diseases. 
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But Modernais also something of a mystery. 
As a private firm, it has revealed very little of 
its research. Its academic founders have pub- 
lished only one study’ using Moderna’s mRNA 
therapeutics technology in rodents. And the 
company itself has disclosed scientific details 
(including some about early work in non- 
human primates) only through patent filings. 
Add in questions about the strength of Mod- 
erna’s patent position and the troubled history 
of other RNA-based drugs, and some analysts 
are wondering whether the company will be 
able to deliver on its promises. 

“T dont think they’ve really overcome the 
critical issues,’ says Dirk Haussecker, an RNA- 
therapeutics consultant in Rastatt, Germany. 
Based on the publicly available records, he says, 
“T haven't seen anything from Moderna that 
makes me say, ‘Oh, they really have a competi- 
tive edge or they're very different — in aleague 
of their own? From a science point of view, it 
doesn't seem to make sense.” But as a business 
it is surging ahead. 


A SIMPLE APPROACH 

On paper, the idea of mRNA therapy seems 
simple. If someone cannot produce enough of 
a certain protein, or produces a broken version, 
doctors could inject their cells with mRNA that 
codes for a replacement protein. This would 
avoid the risks of tinkering with the genome 
permanently, as is done in some forms of gene 
therapy. And whereas growth factors, antibod- 
ies and other complex ‘iological’ drugs can be 
produced in vats by bioengineered cells, these 
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are mostly limited to secreted molecules. An 
mRNA-based therapy would be able to make 
proteins that operate inside the cell as well. 
“mRNA delivery would reinvent how we as 
an industry tackle many diseases,” says Peter 
Kolchinsky, managing partner of RA Capital 
Management in Boston, Massachusetts, which 
is one of the latest investors in Moderna. 

But delivery is tricky. In the early 1990s, 
scientists first demonstrated that injected 
mRNA could generate proteins in mice’ and 
rats’. But protein production was low and tran- 
sient, and the mRNA seemed too unstable to 
make a suitable drug. Years later, researchers 
also realized that lab-synthesized mRNA tends 
to spur an immune attack after it is injected, 
triggering potentially dangerous inflammatory 
responses. So a handful of researchers started 
working their way around the body's defences 
by modifying the RNA. 

Moderna traces its origins to one such effort, 
in the laboratory of Derrick Rossi. A stem-cell 
biologist at Boston Children’s Hospital, Rossi 
and his postdoc Luigi Warren were trying to 
use mRNA to coax cells into a ‘pluripotent’ 
state, capable of giving rise to many cell types. To 
avoid triggering inflammation, the researchers 
replaced some of the RNAs molecular building 
blocks — the nucleosides uridine and cytidine 
— with pseudouridine and 5-methylcytidine. 
This makes the RNA look more like something 
that the cell would produce itself, because invad- 
ers such as bacteria cannot usually make these 
modifications to their own mRNA. 

It worked. In 2010, Rossi and Warren filed to 


PADDY MILLS 


patent their method for making stem cells and 
later published the results of their research’. 

The work caught the attention of Robert 
Langer, a respected bioengineer and serial 
entrepreneur from the Massachusetts Insti- 
tute of Technology in Cambridge, and Noubar 
Afeyan, chief executive of Cambridge biotech 
investment firm Flagship Ventures. Both men 
immediately saw the sweeping potential of the 
modified mRNA. The idea of side-stepping 
the cell's defences “was intriguing instantane- 
ously’, says Afeyan, who now chairs Moderna’s 
board of directors. 

Rossi and Langer brought in a third academic 
co-founder — cardiovascular biologist Kenneth 
Chien, formerly at Harvard Medical School in 
Boston and now at the Karolinska Institute in 
Stockholm — and together they launched Mod- 
erna in September 2010. The name was Rossi's 
invention, a portmanteau of modified and RNA. 

There was just one problem. “Our paper 
really put the whole thing on the map but, iron- 
ically, our paper didn't have anything really to 
do with mRNA therapeutics,” says Warren, who 
now runs Stemiotics, a company in San Diego, 
California, that makes custom-order stem cells 
using modified mRNA. The modified RNAs 
were not even their innovation. 

They got the idea from Katalin Kariko and 
Drew Weissman at the University of Pennsyl- 
vania in Philadelphia (UPenn). In two papers 
that largely fell under the radar at the time, 
these scientists showed that using pseudo- 
uridine and 5-methylcytidine made mRNA 
nearly invisible to cellular defences, both 


in vitro’ and in mice’. In 2005, the pair started 
filing to patent the technology for therapeutic 
purposes. 


DIFFICULT DEALINGS 

Karik6o and Weissman created a company called 
RNARx, which received close to $900,000 in 
small-business grants from the US government. 
In mice and monkeys they showed’ that regu- 
lar mRNA injections could boost production of 
erythropoietin, a hormone that is prescribed to 
treat some forms of anaemia. 

The company’s research efforts ended there, 
however, in part because of disagreements 
between the researchers and the University of 
Pennsylvania over the licensing of their intel- 
lectual property (IP). The university eventually 
sold the licence to Cellscript, a firm in Madison, 
Wisconsin, for an undisclosed sum. Cellscript 
has mostly used the rights to market kits for 
making mRNAs with modified nucleosides, 
but chief executive Gary Dahl says that the 
company also has “an interest in therapeutics”. 
He declined to discuss specifics. 

Kariko and Weissman’s patent posed a chal- 
lenge for Moderna. A 2010 internal report 
from Flagship Ventures, which was nurturing 
Moderna into existence at the time, states that 
if scientists could not identify alternatives to 
pseudouridine and 5-methylcytidine, “our 
company technology may be limited to licens- 
ing IP from UPenn”. 

Moderna needed to find a way around the 
patent, and the task fell to its first employee, 
Jason Schrum. A nucleic-acid biochemist by 
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training, Schrum set to work testing different 
types of modified nucleoside. He bought RNA- 
expression kits from Cellscript and assembled 
an array of nucleoside analogues, some of 
which he designed. 

Most of the modified nucleosides were not 
up to the job. But Schrum found one, a vari- 
ant of pseudouridine called 1-methylpseudo- 
uridine, that seemed to do the trick. According 
to Schrum, mRNA with this nucleoside pro- 
duced even higher levels of protein expression 
with less inflammation than did the mRNA in 
Karik6 and Weissman's papers. Last year, the US 
Patent and Trademark Office granted Moderna 
patents covering the use of 1-methylpseudo- 
uridine, among other nucleosides — but the 
University of Pennsylvania also received a pat- 
ent that covers many of the same nucleosides. 

Several other mRNA-therapeutics companies 
say that they have proprietary formulations of 
modified RNA molecules as well, although few 
are willing to discuss details. “In mRNAs, every- 
thing is deathly quiet; says Ali Mortazavi, chief 
executive of Silence Therapeutics, an RNA bio- 
tech in London. “There’s really no understand- 
ing of who owns what, so nobody wants to 
disclose anything — and we're included in that.” 

Karik6, who now works at the German 
mRNA-therapeutics firm BioNTech in Mainz, 
points to early “signs that there will be a fierce 
battle for licensing” — and not just in the 
United States. Last year, the European Patent 
Office received two anonymous letters chal- 
lenging the validity of Kariké and Weissman’s 
patent application covering modified mRNA; 
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US authorities granted the patent in 2012, buta 
decision is still pending in the European Union. 

The uncertainties over intellectual property 
have clearly not dissuaded Moderna’s inves- 
tors. Kolchinsky says that patent disputes may 
be painful and expensive, but they eventually 
resolve. “Companies that enable such break- 
throughs typically have the resources to fend off 
baseless claims, and settle, on reasonable terms, 
the ones that turn out to be legitimate,” he says. 

Moderna also has time on its side. Flush with 
cash — the company has an estimated $900 mil- 
lion in the bank — it can continue to sign on 
pharmaceutical partners and outspend its rivals 
on science. This year alone, Moderna plans to 
spend between $150 million and $180 million 
on research and development — more than any 
other mRNA drug-maker. 

“They've created this air of inevitability,” 
says Fruehauf. “It’s a good strategy.” 


FUND-RAISER-IN-CHIEF 

Much of that momentum boils down to one 
man: chief executive Stéphane Bancel. “He's a 
damn good salesman,’ says Justin Quinn, a staff 
scientist who worked at Moderna until 2012. 

Bancel joined the company in July 2011 
after leading the diagnostics firm bioMérieux 
of Marcy-l Etoile, France, for five years. Afeyan 
had repeatedly tried to recruit Bancel to run 
Flagship-launched companies, but Bancel was 
not interested in most of the projects — start- 
ups that tended to focus on one lead product 
in one disease area. 

Moderna was different: it promised to 
reinvent the drug industry. And for Bancel, a 
smooth-talking businessman with a penchant 
for stylish, slim-fitting clothing, “it was worth 
taking a career risk and a massive pay cut to go 
to a start-up if it had the potential to be some- 
thing really big’, he says. 

Bancel quickly set to work on raising capi- 
tal — with great success — but some question 
his tactics. In the opinion of a former staff sci- 
entist (who requested anonymity) Bancel used 
his charisma and connections, as well as the 
clout of the company’s co-founders, to convince 
investors and partners of the uniqueness of the 
Moderna platform, while glossing over any pos- 
sible holes in its intellectual property. “He did 
a tremendous job of persuading people to give 
the company money for technology that was not 
100% theirs,” the ex-employee says. 

In response, Bancel says that of course inves- 
tors in Moderna did their due diligence before 
writing cheques: “Companies are a bit more 
sophisticated than that” 

He and other Moderna executives also 
acknowledge the seminal contributions made 
by Karik6, Weissman and others. But Tony de 
Fougerolles, who was Moderna: first chief sci- 
entific officer and now leads research efforts 
at Ablynx in Ghent, Belgium, argues that 
such early work was largely academic, and 
that Moderna approached the research “from 
a pharmaceutical perspective”. Moreover, 
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“THEY VE CREATED 
[HIS AR OF 
INEVITABILITY. 
ITSA GOOD 
STRATEGY.” 


Bancel says that Moderna’ technology has now 
advanced to the point that the company’s initial 
patent filings are “irrelevant” “This is Moderna 
generation 1.0, and we're at 6.0 now,’ he says. 
Moderna no longer relies on 1-methylpseudo- 
uridine in its mRNAs, for example. 

And modified nucleoside chemistry is just 
one part of what goes into building an mRNA 
drug. Another crucial aspect involves working 
out howto get the mRNA into specific cells and 
tissues in the body — a challenge that contin- 
ues to vex the related field of RNA-interference 
therapeutics, which emerged more than a dec- 
ade ago but has had few clinical successes. “The 
key for messenger RNA is going to be delivery,’ 
says Joseph Payne, president and chief execu- 
tive of Arcturus Therapeutics in San Diego, one 
of many drug developers working on nano- 
particle-based delivery of mRNA therapeu- 
tics. “That's really the rate-limiting step,” adds 
Haussecker. 

Bancel says that Moderna is exploring sev- 
eral delivery technologies through its in-house 
team and partnerships with others — although 
he would not divulge details of the company’s 
approach. “People will figure out in 18 months 
where we are now when they see the patents,” 
he says. Although at that point, he adds, even 
those methods will probably be out of date. 


THE BEAST 
Atits sleek Cambridge headquarters, Moderna 
is equipping itself with the best laboratories that 
money can buy. In the middle of a third-floor 
lab sits “the beast”, as Bancel calls it: a suite of 
robots that can make up to 50 lots of therapeu- 
tic mRNA per day for testing in non-human 
primates. Moderna also plans to open a facility 
for making human-grade mRNA later this year. 
Its resources have allowed the company to 
launch more than 50 drug-development pro- 
grammes, mostly through external pharmaceu- 
tical partners, but also at three wholly-owned 
spin-offs: Onkaido, Valera and Elpidera, which 
focus on oncology, infectious diseases and rare 
diseases, respectively. Bancel says that Valera 
will be first to the clinic, with an mRNA drug 
that targets an undisclosed infectious disease. 
“By the end of 2016, we will have trials for all 
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the therapeutic areas we are in today,’ he says. 

But clinical success is by no means guaran- 
teed. “It will probably be like the technologies 
before it,” says James McSwiggen, an inde- 
pendent biotechnology consultant who has 
worked with Moderna in the past. Other RNA- 
based drugs, such as antisense therapies, RNA 
interference and, most recently, microRNA, 
have all gone through periods of industry exu- 
berance. These are generally followed by years 
wrestling with scientific realities before the 
technologies begin to show their true clinical 
promise. “I suspect that the same will happen” 
with mRNA, says McSwiggen. “If any com- 
pany can weather that boom bust bit, I would 
imagine that, given the amount of money that 
they’ve raised, Moderna should” 

Other mRNA-therapeutics companies are 
persevering, and are getting promising data 
from studies in large animals. Cure Vac, a Ger- 
man company that spun off from the University 
of Tiibingen in 2000, has found that it can get 
injected mRNA past the immune defences of 
pigs and monkeys by picking molecules with 
optimal sequences rather than by modifying 
their nucleosides*. So far, CureVac has struck 
deals with several big pharmaceutical compa- 
nies and raised around $220 million in equity, 
including $52 million secured from the Bill 
& Melinda Gates Foundation in March this year. 

Dublin-based rare-disease specialist Shire, 
in collaboration with Ethris of Planegg, Ger- 
many, has achieved targeted lung delivery of 
mRNA ina pig model for cystic fibrosis. “For a 
huge idea” like mRNA, says Michael Heartlein, 
head of MRNA therapeutics at Shire, “I think 
there's a lot of room for different technologies 
and different players”. 

But Bancel’s ambition is for Moderna to 
grow so fast and so big that the competition 
simply has no chance. “We want to be the com- 
pany that, if you want to make an mRNA drug 
five years from now, you pick up the phone 
and you call Moderna,’ he says. “Think about 
it: if you're going to put $50 or $100 million 
into mRNA, do you want to put it into your 
own team, starting four years behind, and with 
all the IP issues? Or do you want to pile it on 
$900 million of someone else’s money?” 

As for the naysayers and critics, Bancel says, 
“Tunderstand people are not happy. I under- 
stand people are jealous. I understand all that. 
It’s life?” = 


Elie Dolgin is a science writer in Somerville, 
Massachusetts. 
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ILLUSTRATION BY MIKEL JASO 


Prepare for unexpected 
prenatal test results 


Women are learning about their own health problems through fetal screening. 
Revise consent forms and raise awareness, urges Diana W. Bianchi. 


healthy pregnant woman has a blood 
A= to rule out the possibility that 

her baby has certain abnormalities, 
such as Down's syndrome. One week later, a 
genetic counsellor calls her and recommends 
a follow-up test such as amniocentesis. 
When the counsellor calls again, she says 
that the baby is healthy but that the mother 
needs to be screened for cancer. 

Since 2011, clinicians have been able to 
analyse the genome of a fetus by sequenc- 
ing DNA fragments found floating in the 
mother’s blood. With the use of these non- 
invasive prenatal tests soaring (see “Test 
scores’), mothers are increasingly facing 
unexpected, ‘incidental’ findings about their 
own health. As of late 2014, at least 26 preg- 
nant women with abnormal blood-test 
results later learned that they had cancer’. 


In 10 of them, the prenatal tests prompted 
the medical assessments that revealed this; in 
the other 16, the cancers were not discovered 
until the mothers developed symptoms. 
Parents, obstetricians and physicians have 
been taken by surprise. Consent forms used 
by test providers rarely mention the possi- 
bility of findings concerning the mother’s 
health. And caregivers have little guidance 
on what to do when such findings arise. 
Test providers need to rethink their 
consent forms to prevent unwarranted 
confusion and anxiety — not least, women 
deciding to terminate their pregnancies on 
the basis of wrong interpretations of test 
results’. And professional societies, such 
as the American College of Medical Genet- 
ics and Genomics (ACMG), the American 
College of Obstetricians and Gynecologists 
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and the Society for Maternal-Fetal Medicine 
(SMFM), need to take the lead on providing 
education and clinical guidance. 


FRAGMENTS THAT FOOL 

These latest screening tests extract 
fragments of maternal and placental DNA 
(a proxy for fetal DNA) from the mother’s 
blood. The fragments are sequenced and 
aligned to specific parts of a standard ‘ref- 
erence human genome’. (In some cases, 
the reference genome is obtained from the 
mother’s white blood cells.) By comparing 
the number of mapped fragments to the 
number expected to align, investigators can 
check whether there are too few or too many 
chromosomes (or parts of them) in the cells 
from which the fragments originated. If the 
initial analysis indicates an anomaly, 
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> a follow-up diagnostic procedure is 
strongly recommended. 

The low rate of false-positive results from 
these blood tests — around 0.2%, down from 
the roughly 5% for older screening methods* 
— has greatly reduced the need for invasive 
follow-up tests such as chorionic villus sam- 
pling (CVS) or amniocentesis’. 

In some cases, an initial analysis indicates 
an anomaly, but a follow-up procedure 
shows that the number of chromosomes in 
the fetal cells are normal’. There are several 
possible explanations: a twin might have 
died in the womb or developmental glitches 
may have caused clusters of abnormal cells 
in the placenta, a condition called confined 
placental mosaicism. A third explanation is 
a health problem in the mother. 

Some women have discovered that they 
have a sex-chromosome abnormality that is 
associated with reduced fertility’. Some have 
found out that they have DiGeorge syndrome, 
a genetic abnormality associated with learn- 
ing difficulties, immune problems and con- 
genital heart defects’. Others have been told 
that some of their cells contain an abnormal 
number of chromosomes’. And increasingly, 
imbalances in the number of chromosome 
copies have flagged the presence of a tumour’. 

Clinicians have yet to discover all that 
non-invasive prenatal testing can reveal 
about mothers. In my research, I have 
encountered three separate cases in which, 
on the basis of a maternal blood test, care- 
givers informed pregnant women that they 
would be having boys. After ultrasound 
images later showed that all three were 
pregnant with girls, it emerged that the 
Y chromosome sequences in the mothers’ 
blood originated from transplanted organs 
that they had received from men’. 

Although the commercial providers of the 
tests are striving to obtain data on follow-up 
assessments, doing so is hard, so the true 
extent of such incidental findings for moth- 
ers is unknown. A study in China revealed® 
that in a group of 181 pregnant women — for 
whom follow-up procedures ruled out a prob- 
lem in the fetus — 16 (9%) had a previously 
undiagnosed sex-chromosome abnormality. 

On the basis of current estimates, if one 
million blood tests are performed in a given 
year, at least 2,000 women will have an abnor- 
mal result that disagrees with the results of a 
diagnostic procedure such as CVS or amnio- 
centesis. Most often this is because of con- 
fined placental mosaicism, but women are 
increasingly discovering abnormalities that 
may have implications for their own health. 


CONSENT IS CRUCIAL 

There is considerable confusion about how 
to handle incidental findings. As a medical 
geneticist, I frequently get calls and e-mails 
from obstetricians and other health-care 
providers asking, “What should I tell the 


30 | NATURE | VOL 522 | 4 JUNE 2015 


TEST SCORES 


Since late 2011, clinicians have been able to 
screen mothers’ blood for fetal chromosome 
problems using circulating DNA. 


800° 


400 ~~ 


Worldwide maternal blood tests* 
(thousands) 


2012 2013 2014 


*Numbers as reported by Illumina, Sequenom, Ariosa 
Diagnostics, Berry Genomics and BGI in GenomeWeb articles. 


patient?” Most caregivers are still grappling 
with the practical challenges of incorporat- 
ing a new type of prenatal test into clinical 
care; few are familiar with genome sequenc- 
ing or trained to discuss the management of 
a pregnancy that has been complicated by the 
discovery of a maternal health problem. 

I reviewed the consent forms used by five 
major US commercial providers of non- 
invasive prenatal blood tests. In two of them 
the physician, not the mother, signs the form 
stating that the mother has been counselled. 
In two of the other three, the mother signs 
the form, but the form either does not men- 
tion incidental findings or it explicitly states 
that the laboratory will not report them. 
Only one notes that, “in rare circumstances, 
genetic testing may reveal sensitive informa- 
tion about your own health”. 

To be fair, the possibility and extent of inci- 
dental findings for mothers has been appreci- 
ated only recently as the number of women 
being tested has increased. Still, providers 
must keep pace. Incidental findings could 
have major implications for mothers’ care, 
life- and health-insurance policies, and so on. 

A study published last year’ indicated that 
6% of women who received an unusual pre- 
natal DNA blood test result terminated their 
pregnancies without having CVS or amnio- 
centesis. Women might weigh options differ- 
ently if they understood that a result could 
signal a genetic anomaly in themselves, 
rather than in their baby. 

Consent forms should be provided 
directly to the mother to sign. They should 
explicitly state that incidental results could 
emerge with implications for the mother’s 
health. Check boxes could be used to opt in 
or out of being told certain things: for exam- 
ple some women might want to know that 
they have chaotic DNA patterns suggestive 
of a tumour, but prefer not to be told that 
they have a sex-chromosome abnormality. 

Also crucial is better education for parents 
and health-care providers about the various 
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prenatal blood tests now available — on how 
they work and what they can detect. Webinars 
or videos in multiple languages should be 
provided as soon as feasible by professional 
societies, such as the ACMG or the SMFM. 

In parallel, clinicians and researchers need 
to get a better grasp on what kind of inciden- 
tal findings could arise. New clinical tests 
have almost always been developed and vali- 
dated in academic laboratories before being 
licensed. Yet until late 2014, blood tests for 
prenatal DNA screening were exclusively 
provided by commercial labs. Although 
the major companies offering the tests have 
propelled the field forward by increasing the 
number of women being tested and publish- 
ing results, gaps in follow-up clinical informa- 
tion are impeding understanding. 

In the Netherlands, a nationwide evalua- 
tion is being conducted as a first step towards 
incorporating such tests into everyday health 
care. It is called the Trial by Dutch Laborato- 
ries for Evaluation of Non-Invasive Prenatal 
Testing (TRIDENT) study (see go.nature. 
com/qk2kpj). All abnormal results are cor- 
related with the results of fetal or newborn 
chromosome testing, ultrasound evalua- 
tion of fetal growth, and placental studies to 
rule out confined placental mosaicism. The 
United States and other nations needa similar 
registry of the results of non-invasive prenatal 
testing, paired with clinical follow-up data. 

The speed at which these blood tests have 
taken off in mainstream health care has 
brought focus and urgency to issues long 
debated in genetic circles’ — particularly, 
when and how to report to patients second- 
ary findings from genomic sequencing. 
Handled properly, the incidental findings 
emerging from prenatal tests could acceler- 
ate treatments and save lives — rather than 
just increase the anxiety of thousands of 
pregnant women. ™ 
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August Weismann, painted by Otto Scholderer in 1896. 


EVOLUTIONARY BIOLOGY 


Paean to a founder 
of heredity 


Jane Maienschein applauds a study of towering 
nineteenth-century biologist August Weismann. 


monumental study of an important 
At surprisingly little-studied biolo- 
gist, August Weismann represents 
half a century of scholarly investment by 
historian of science Frederick Churchill. 
Churchill immersed himself in the observa- 
tions and experiments, people, institutions 
and ideas of the nineteenth and early twen- 
tieth centuries — an astoundingly fertile era 
for the biological sciences — as wellas in all of 
the many books and articles that Weismann 
(1834-1914) wrote, in German and English. 
That anybody can write this kind of book 


these days is awe-inspiring. When Churchill 
was my PhD supervisor in the 1970s, he 
warned me not to take on too large a disser- 
tation topic, given the dearth of jobs and the 
pressures to move results into print quickly. 
The history of science is richer for his not 
having heeded his own advice. 

Weismann’s great contribution was the 
idea that germ-plasm — the name that he 
gave to the essential element of gametes, or 
eggs and sperm — carries the material of 
heredity from one generation to the next, 
unaffected by the environment. Germ-plasm 
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BOOKS & ARTS 


theory, which Weis- 
mann set out in 
the 1880s, rejected 
the possibility that 
acquired characteris- 
tics can be inherited, 
as propounded by 
Jean-Baptiste Lamarck 
in the early nineteenth 
century. Weismann’s 
materialistic view of 
life provided a new 


y 


August Weismann: 
Development, 
Heredity, and 


banat B understanding of biol- 
CHURCHILL ogy, in which natural 
Harvard Univ. Press: selection occurred 
2015. between individual 


organisms, as Charles 
Darwin held, and with competition at the 
level of inherited determinants. This influ- 
enced the nature of the resulting variations. 
So Weismann, as Churchill makes clear, had 
one foot in the observational traditions and 
questions of nineteenth-century natural his- 
tory — evenas he extended the other into the 
experimental, theoretical twentieth century. 

August Weismann reveals a scientist who 
grew up fascinated by nature and worked 
briefly as a medical doctor, then adjusted to 
roles as a faculty member at the University 
of Freiburg in Germany and director of its 
zoological institute, where he spent most 
of his career. Weismann looked at butterfly 
variations in considerable detail to explore 
patterns of heredity, protozoa to get at 
reproduction, jellyfish-like hydromedusae, 
the planktonic crustaceans Daphnia, and 
frogs — whatever it took to study the rel- 
evant phenomena. In 1862, while living at 
the isolated Schaumburg Palace as personal 
physician to Archduke Stephan of Austria 
for a year, he also explored natural his- 
tory, especially of insects. The experience 
reinforced Weismann’s determination to 
pursue biological research rather than med- 
icine, and he moved to Freiburg, where he 
took on a series of research positions. 

He increasingly turned to microscopy, 
experimental embryology and cytology to 
look deep into an organism to see changes 
in the cells and nucleus. At the Freiburg 
zoological institute, Weismann relied on his 
own observations and those of his students 
and assistants — both in the spirit of col- 
laboration and to compensate for his failing 
eyesight. 

Throughout, Weismann insisted that 
development, heredity and evolution are 
interconnected and must be studied as 
such. As he saw it, biologists should seek 
the underlying chemical and physical 
mechanisms. So while 
his contemporaries 
focused on details 
of cells, chromo- 
somes or evolution- 
ary mechanisms, | 
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> Weismann sought to illuminate the 
links between them. 

Churchill shows how Weismann’s 
experimental observations of chromo- 
some movements during cell division 
reinforced his germ-plasm theory. 
Weismann adapted ideas from leading 
cytologists and experimental embry- 
ologists such as Theodor Boveri and 
Wilhelm Roux, to link heredity and 
development through what Churchill 
calls his “architectonic view”. Instead of a 
holistic or vitalistic understanding of the 
organism, Weismann developed a more 
structural view in which the system 
depends on integration of the material 
parts, with guidance from the germ- 
plasm and its determinants. Develop- 
ment then constructs the organism out 
of cells, with reproduction providing 
a source of variation on which natural 
selection acts, enabling evolution. 

Weismann was confident and some- 
times controversial. Churchill shows 
how disagreements helped Weismann 
to work out his own ideas. He sparred, 
for example, with physician and biologist 
Rudolf Virchow over issues of acquired 
characteristics and the role of external 
factors in shaping variation. And Weis- 
mann clashed with zoologist Theodor 
Eimer about the apparent randomness 
of evolution. Dozens of leading scientists 
influenced or interacted with Weismann 
because of his central role in biology. The 
zoologist and illustrator Ernst Haeckel, 
for example, was a close friend, but his 
views diverged from Weismann’s in ways 
that influenced both men, and affected 
public perceptions of biology. We also see 
the impact on Weismann of technically 
brilliant figures such as Boveri and theo- 
reticians including Darwin. Weismann in 
turn influenced his contemporaries and 
subsequent generations of Darwinians. 

Even as his contemporaries began to 
specialize and to give up the study of con- 
nections between heredity, development 
and evolution in favour of specialized 
study of the parts or selected processes, 
Weismann worked hard to develop a 
comprehensive understanding of life. 
Churchill has mirrored that determina- 
tion in developing a compelling and com- 
prehensive understanding of Weismann, 
his ideas, work, life, contemporaries and 
context. m 
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The hybrid dinosaur Indominus rex runs rampant in Jurassic World. 


Q&A Jack Horner 


The dinosaur doctor 


Montana palaeontologist Jack Horner has served as scientific adviser on the Jurassic Park films 
from the start. With the latest, Jurassic World, soon to be released, he talks about a shark- 
devouring Mosasaurus, breeding chickens back into dinosaurs and the influence of the film 


franchise on his own field. 


How did you get 
involved in the series? 
In the early 1990s, a 
colleague called me 
and said, “Youre in a 
book about cloning 
dinosaurs” — Michael 
Crichton’s Jurassic 
Park (Alfred A. Knopf, 
1990). I said, “I hope my character doesn't get 
eaten.” I never bothered to pick it up; 1am 
dyslexic and have trouble enough keeping up 
with my own science. Then director Steven 
Spielberg called and asked whether I wanted 
to work on the film. I thought growing a dino- 
saur was an intriguing idea, and I still do. It is 
alittle far-fetched now, but I think one day we 
will be able to do it, not using amber-trapped 
DNA, but through genetic modification of 
dinosaurs’ closest living relatives, birds. 


What did work on Jurassic Park (1993) entail? 
My job was to find things that were obviously 
wrong. In one scene, the puppeteers were hav- 
ing trouble getting an animatronic Tyranno- 
saurus rex leg to move properly. So I stepped 
in to control the joystick, making the foot land 
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on its toes in a bird-like position, rather than 
heel-first like a mammal. In a kitchen scene, 
the puppeteers had velociraptors sticking out 
forked tongues, which dinosaurs did not have. 
Instead, we had the raptors snort to fog up the 
window, revealing that they had warm blood. 


What are the innovations in Jurassic World? 

The science has got ahead of the films, but we 
cannot really change the way the dinosaurs 
look. If suddenly the raptors had feathers, it 
would destroy consistency. But I did help to 
render new creatures. You can see a mosasaur, 
a giant swimming reptile, shoot up from a 
tank to eat a great white shark. From my 
research, I helped to ensure that the juvenile 
triceratops, with its backward-curving horns, 
looked distinct from the adult, whose horns 
curve forward. But my biggest job was helping 
to create the ‘genetically modified Indominus 
rex, a combination of several dinosaurs and 
other animals, which turns against its makers. 


How plausible is such a dino-hybrid? 

Jurassic World is set in the future. If you can 
clone a dinosaur, you can modify its DNA 
and combine it with that of other animals. We 
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Jurassic World —_ already have lots of tools 
DIRECTOR: COLIN for modifying an animal. 
ine We have been breeding 
Universal: 2015. P 

them for centuries. Now 
we are getting to the point where we can take 
genes out of one organism and put them 
into another, for example taking fluorescent 
genes out of jellyfish and putting them into 
the embryos of other animals to make them 
glow in the dark. The challenge is finding 
ways of changing a creature without killing 
it. And] think we will. 


Are you trying to breed birds back into 
dinosaurs? 

In the Dino-Chicken Project at Montana 
State University in Bozeman, we are looking 
for the genetic pathways that provided the 
transformation from dinosaurs into birds, 
with the hope that some of those pathways 
can be reversed. Part of it is genetic engi- 
neering to see if we can get a long tail back 
on a chicken (D. J. Rashid et al. EvoDevo 
5, 25; 2014). My postdoc Dana Rashid has 
screened mouse genes, looking for pathways 
that cause mice to lose their tails. If she can 
find one that causes a similar reaction in a 
reptile, it might be possible to reverse the 
process and grow a tail on a chicken. 


Do the films do justice to the science? 

Each film explains a bit of the science, for 
example through the dancing DNA cartoon in 
the first movie. If people are wondering about 
whether the science in Jurassic World is real, 
that is great for science. Jurassic Park brought 
out all sorts of students who wanted to switch 
careers into palaeontology. It channelled a 
flood of graduate students to my lab, includ- 
ing some of the best scientists I have trained. 


How have digital effects changed your work? 
For the first film, I would sit with Steven 
Spielberg and advise him on the motions of 
the dinosaur puppets. But Jurassic World had 
only one puppet on set — an injured sauro- 
pod. For the rest of the dinosaurs, most of my 
consulting was with the graphics people. 


What do we know about how dinosaurs 
behaved? 

They were more like robins than crocodiles. 
Their spikes and shields were too flimsy for 
fighting and were more likely to be for display, 
like the bony crests on some modern birds. 
Some dinosaurs had feathers and probably 
‘danced like birds. If you built a Jurassic Park, 
it would be more like the Serengeti than Jaws. 
I wrote ascript once for a film where scientists 
come out of their time machine to see tricera- 
tops dancing and showing off their coloured 
shields. Nobody would go to that movie. = 


INTERVIEW BY JASCHA HOFFMAN 


This interview been edited for length and clarity. 
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Books in brief 


The Worm at the Core: On the Role of Death in Life 

Sheldon Solomon, Jeff Greenberg and Tom Pyszczynski RANDOM HOUSE 
(2015) 

How do we cope with the knowledge of mortality? In this considered 
treatise, psychologists Sheldon Solomon, Jeff Greenberg and Tom 
Pyszczynski present their “terror management theory”, positing that 
we hold off existential fear through our cultural world view and sense 
of personal significance. Drawing on several disciplines and many 
experimental-psychology studies, they conclude that embracing 
ambiguity and cultivating meaning in life create the basis for the 
finely calibrated courage that we need to face our inevitable end. 


Empire of Tea: The Asian Leaf That Conquered the World 

Markman Ellis, Richard Coulton and Matthew Mauger REAKTION (2015) 
‘Tea’ has at least five meanings: the shrub Camellia sinensis; its leaf; 
the dried commodity; the infusion made from it; and the occasion 

for consuming the infusion. As Markman Ellis, Richard Coulton and 
Matthew Mauger show in this stimulating volume, history is steeped 
in the stuff. In eighteenth-century Britain, tea smugglers murdered 
customs officers; across the Atlantic, excise duty provoked the Boston 
Tea Party. In 1920, say the authors, John Maynard Keynes “imagined 
tea at the centre of the modern mercantile world”. With 290 billion 
litres of tea imbibed in 2013, the taste for it seems set to grow. 


The House of Owls 

Tony Angell YALE UNIVERSITY PRESS (2015) 

Wildlife artist and naturalist Tony Angell, who memorably explored 
corvid behaviour with John Marzluff (see N. Clayton Nature 484, 
453-454; 2012), here turns to the owl. A self-confessed strigiphile, 
Angell has had western screech owls (Megascops kennicotti/) nesting 
outside his home in Washington state for 25 years, and his exquisite 
monochrome illustrations testify to that intimate coexistence. Angell 
delves, too, into the owl in culture, and the ranges and habitats of the 
19 species found in North America. A treat for fans of these strangely 
remote, inquisitive, astonishingly sharp-eared and -eyed raptors. 


Move: Putting America’s Infrastructure Back in the Lead 

Rosabeth Moss Kanter W. W. NORTON (2015) 

The US transport infrastructure is riddled with “pain points and 
bottlenecks”, from delayed flights to crumbling bridges. So notes 
Harvard business professor Rosabeth Moss Kanter in this propulsive 
study, which argues for an overhaul of US transport to boost the 
economy, ease commuting and curb emissions. Kanter delivers a 
number-crunched analysis of the state of road, rail and air transport, 
and details progress on intelligent transportation and smart cities. 
But with government and industry preventing advances, the prime 
hurdle, she notes, is a lack of political will at the top. 


Spirals in Time: The Secret Life and Curious Afterlife of Seashells 
Helen Scales BLOOMSBURY SIGMA (2015) 

Structurally elegant and often stunningly marked, seashells have 
obsessed scientists for centuries — as attested by the millions housed 
in London’s Natural History Museum alone. In this engaging study of 
molluscs, marine biologist Helen Scales covers a wealth of research 
on this vast phylum, from findings on shell shape and colour (rococo 
formations may deter predators, whereas pigmented patterns could 
be a mollusc’s way of tracking its own construction process), to the 
ecosystem services performed by oyster beds. Barbara Kiser 
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Ukrainian science 
needs elixir of youth 


Ukraine's science system stands 
to benefit from its association 
with the European Union (EU) 
Horizon 2020 flagship research 
programme (Nature http:// 
doi.org/4kq; 2015). But it has 
problems beyond funding: the 
re-election of Boris Paton as 
president of Ukraine's National 
Academy of Sciences at the age of 
96 is symptomatic. 

We are involved in an initiative 
to boost cooperation between the 
EU and Ukraine in biomedicine 
(COMBIOM). In our view, this 
will be difficult as long as young 
scientists feel that they are being 
held back by the rigid Soviet-style 
system run by scientists of the old 
school. Early-career researchers 
want to gain experience abroad 
and have little incentive to return. 

Ukraine's science system must 
be made more competitive. It 
should reward young scientists 
who have international expertise 
and enable them to lead research 
teams. It should encourage job 
flexibility and contracts for 
academy researchers, and identify 
strategies and research areas to 
optimize scientific development. 
It should create institutions that 
specialize in those areas, and 
appoint an independent body of 
EU researchers and Ukrainian 
scientists abroad to evaluate 
internal funding applications. 

Such measures would create a 
healthy scientific community and 
promote Ukraine's integration 
with the European Research Area. 
Yegor Vassetzky CNRS- Institut 
Gustave Roussy, Villejuif, France. 
Ivan Gout University College 
London, UK. 

Jacek Kuznicki International 
Institute of Molecular and 
Cellular Biology, Warsaw, Poland. 
vassetzky@igr.fr 


Improve oversight of 
fracking in China 
We are concerned that China is 


paying insufficient attention to 
earthquakes that are induced by 


injecting huge volumes of waste 
water deep underground (see 
also Nature 520, 418-419; 2015). 

Most of China’s shale-gas 
resources lie near seismic fault 
zones. Wastewater injection by 
Chinese oil and gas industries 
has induced minor earthquakes 
in Sichuan, Chongqing, 
Xinjiang, Henan, Liaoning 
and Hubei — 6 of 13 provinces 
prioritized for shale-gas 
exploitation (see go.nature. 
com/uriceh; in Chinese). In 
Chongqing’s Rongchang gas 
field, for example, 32,000 surface 
earthquakes were recorded 
between 1998 and 2006 (X. Lei 
et al. J. Geophys. Res. Solid Earth 
113, B10310; 2008). 

Replacing coal with gas is 
central to China's plans to reduce 
air pollution and carbon dioxide 
emissions. We call for stricter 
regulation and tighter monitoring 
of its fracking industry to curb 
seismic activity and environmental 
pollution (see H. Yang et al. 
Nature 499, 154; 2013). 

Hong Yang University of Oslo, 
Norway. 

Julian R. Thompson, Roger 
J. Flower University College 
London, UK. 
hongyanghy@gmail.com 


Use ‘4Rs’ criteria to 
assess papers 


We propose a ‘4R approach to 
assessing reported research, 
underpinned by statistical rigour 
(see J.T. Leek and R. D. Peng 
Nature 520, 612; 2015). These 4Rs 
denote reproduction, replication, 
robustness and revelation. 

Journals are aware of the 
need for the first two: whether 
enough information is available 
to reproduce an experiment, 
and whether its original results 
can be replicated. Even if the 
experiment can be reproduced, 
replication is often an issue, so 
journals are increasingly asking 
authors for details of software 
code and raw data. Videos of 
each experimental step could 
also be included. 

Variations in experimental and 
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analytical methods are a concern 
for referees and readers, hence 
the need for robustness. A well- 
conducted study should indicate 
the sensitivity of its conclusions 
to the various assumptions that 
were made in deriving them. 
Revelation relates to the 
need for accountability and 
transparency. Scientists must 
communicate more effectively 
by disclosing their reasoning 
for how they develop strategies, 
derive insights and draw 
conclusions. 
Adrian Pagan University of 
Sydney; and Melbourne Institute 
of Applied Economic and Social 
Research, Australia. 
Benno Torgler Queensland 
University of Technology, Australia. 
benno.torgler@qut.edu.au 


Bird sequencing 
project takes off 


On 3 June, the Avian 
Phylogenomics Consortium 
announced its ‘Bird 10K’ project 
to generate draft genome 
sequences for about 10,500 extant 
bird species over the next 5 years. 
The sequences, along with 
data we aim to collect on the 
morphological, physiological, 
ecological and behavioural traits 
of every bird species, will inform 
studies on avian evolution, 
ecology, population genetics, 
neurobiology, development and 
conservation. They could also be 
useful for investigating infections 
that pass from animals to 
humans, such as avian influenza. 
This wealth of information 
will allow us to complete the 
genomic tree of life for modern 
birds. We hope to decode the 
links between genotypes and 
phenotypes; to determine genetic 
evolutionary, biogeographical 
and biodiversity relationships 
across species; and to evaluate 
how ecological factors and 
humans affect bird evolution. 
We plan to conduct the project 
in four phases, based on the 
avian classification hierarchy. 
The first, ordinal phase (for 
34 orders of birds) has been 
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accomplished (see also G. Zhang 
et al. Science 346, 1308-1309; 
2014). Collection of genomic 
data for the second, familial 
phase (about 240 families) is 
ongoing. Specimen and trait- 
data collection for the third 
phase (2,250 genera) and the 
fourth phase (the remaining 
8,000 or so species) is under way. 
See http://b10k.genomics.cn for 
more information. 

Guojie Zhang* China National 
GeneBank, BGI-Shenzhen, China. 
zhanggj@genomics.cn 

*On behalf of 6 correspondents (see 
go.nature.com/v9sI8z for full list). 


Diagnostic service 
shares BRCA data 


Asa partner in the BRCA Share 
initiative for breast-cancer 
genetic data, we wish to clarify 
our position (see Nature 520, 
585; 2015). 

Quest Diagnostics tests one 
in three US adults annually, 
including for BRCA gene 
mutations. We support open- 
access sharing of these data, once 
the complexities of uploading so 
many records can be resolved. 

A final test run for uploading 
Quest data to the Leiden Open 
Variation Database (LOVD) is 
now complete. Contrary to your 
implication, we anticipate that 
the publicly available database 
funded by the US National 
Institutes of Health, Clin Var, 
will eventually have access to 
these data because of a reciprocal 
relationship with the LOVD. 

Industry is often criticized for 
not giving back. Labs that make 
revenue from BRCA testing pay 
to participate in BRCA Share; 
academic scientists and entities 
do so for free. This reduces 
the need for public funding to 
improve BRCA tests. Some of the 
fees will go to functional studies 
of BRCA variants. BRCA Share 
also raises the bar of responsibility 
for commercial labs. 

Charles Strom Quest Diagnostics, 
California, USA. 
charles.m.strom@ 
questdiagnostics.com 
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prioritized for shale-gas 
exploitation (see go.nature. 
com/uriceh; in Chinese). In 
Chongqing’s Rongchang gas 
field, for example, 32,000 surface 
earthquakes were recorded 
between 1998 and 2006 (X. Lei 
et al. J. Geophys. Res. Solid Earth 
113, B10310; 2008). 

Replacing coal with gas is 
central to China's plans to reduce 
air pollution and carbon dioxide 
emissions. We call for stricter 
regulation and tighter monitoring 
of its fracking industry to curb 
seismic activity and environmental 
pollution (see H. Yang et al. 
Nature 499, 154; 2013). 

Hong Yang University of Oslo, 
Norway. 

Julian R. Thompson, Roger 
J. Flower University College 
London, UK. 
hongyanghy@gmail.com 


Use ‘4Rs’ criteria to 
assess papers 


We propose a ‘4R approach to 
assessing reported research, 
underpinned by statistical rigour 
(see J.T. Leek and R. D. Peng 
Nature 520, 612; 2015). These 4Rs 
denote reproduction, replication, 
robustness and revelation. 

Journals are aware of the 
need for the first two: whether 
enough information is available 
to reproduce an experiment, 
and whether its original results 
can be replicated. Even if the 
experiment can be reproduced, 
replication is often an issue, so 
journals are increasingly asking 
authors for details of software 
code and raw data. Videos of 
each experimental step could 
also be included. 

Variations in experimental and 
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analytical methods are a concern 
for referees and readers, hence 
the need for robustness. A well- 
conducted study should indicate 
the sensitivity of its conclusions 
to the various assumptions that 
were made in deriving them. 
Revelation relates to the 
need for accountability and 
transparency. Scientists must 
communicate more effectively 
by disclosing their reasoning 
for how they develop strategies, 
derive insights and draw 
conclusions. 
Adrian Pagan University of 
Sydney; and Melbourne Institute 
of Applied Economic and Social 
Research, Australia. 
Benno Torgler Queensland 
University of Technology, Australia. 
benno.torgler@qut.edu.au 


Bird sequencing 
project takes off 


On 3 June, the Avian 
Phylogenomics Consortium 
announced its ‘Bird 10K’ project 
to generate draft genome 
sequences for about 10,500 extant 
bird species over the next 5 years. 
The sequences, along with 
data we aim to collect on the 
morphological, physiological, 
ecological and behavioural traits 
of every bird species, will inform 
studies on avian evolution, 
ecology, population genetics, 
neurobiology, development and 
conservation. They could also be 
useful for investigating infections 
that pass from animals to 
humans, such as avian influenza. 
This wealth of information 
will allow us to complete the 
genomic tree of life for modern 
birds. We hope to decode the 
links between genotypes and 
phenotypes; to determine genetic 
evolutionary, biogeographical 
and biodiversity relationships 
across species; and to evaluate 
how ecological factors and 
humans affect bird evolution. 
We plan to conduct the project 
in four phases, based on the 
avian classification hierarchy. 
The first, ordinal phase (for 
34 orders of birds) has been 
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accomplished (see also G. Zhang 
et al. Science 346, 1308-1309; 
2014). Collection of genomic 
data for the second, familial 
phase (about 240 families) is 
ongoing. Specimen and trait- 
data collection for the third 
phase (2,250 genera) and the 
fourth phase (the remaining 
8,000 or so species) is under way. 
See http://b10k.genomics.cn for 
more information. 

Guojie Zhang* China National 
GeneBank, BGI-Shenzhen, China. 
zhanggj@genomics.cn 

*On behalf of 6 correspondents (see 
go.nature.com/v9sI8z for full list). 


Diagnostic service 
shares BRCA data 


Asa partner in the BRCA Share 
initiative for breast-cancer 
genetic data, we wish to clarify 
our position (see Nature 520, 
585; 2015). 

Quest Diagnostics tests one 
in three US adults annually, 
including for BRCA gene 
mutations. We support open- 
access sharing of these data, once 
the complexities of uploading so 
many records can be resolved. 

A final test run for uploading 
Quest data to the Leiden Open 
Variation Database (LOVD) is 
now complete. Contrary to your 
implication, we anticipate that 
the publicly available database 
funded by the US National 
Institutes of Health, Clin Var, 
will eventually have access to 
these data because of a reciprocal 
relationship with the LOVD. 

Industry is often criticized for 
not giving back. Labs that make 
revenue from BRCA testing pay 
to participate in BRCA Share; 
academic scientists and entities 
do so for free. This reduces 
the need for public funding to 
improve BRCA tests. Some of the 
fees will go to functional studies 
of BRCA variants. BRCA Share 
also raises the bar of responsibility 
for commercial labs. 

Charles Strom Quest Diagnostics, 
California, USA. 
charles.m.strom@ 
questdiagnostics.com 
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Pluto leads the way 
in planet formation 


Images from the Hubble Space Telescope cast new light on the orbits, shapes 
and sizes of Pluto’s small satellites. The analysis comes just before a planned 
reconnaissance by the first spacecraft to visit them. SEE ARTICLE P.45 


SCOTT J. KENYON 


make up the only ‘binary planet’ in the 

Solar System. With a mass roughly 11% 
that of Pluto, Charon orbits the binary systems 
centre of mass at a distance of 17,500 kilo- 
metres every 6.4 days. Over the past decade, 
images from the Hubble Space Telescope 
(HST) have revealed four circumbinary 
satellites with orbital periods of 20-40 days 
and masses roughly 0.001% (or less) of Pluto’s 
(Fig. 1). Before the discovery of the innermost 
and least massive of these moons, Styx, dynam- 
ical studies’ had suggested that the other three, 
Nix, Kerberos and Hydra, are packed as closely 
together as possible, with no room for other 
stable satellites between their orbits. 

On page 45 of this issue, Showalter 
and Hamilton’ present an analysis of 
all available HST images of the system, 
and derive new orbits and masses for 
the moons. They also derive limits on 
the moons’ previously unknown shapes 
and reflectivities. As well as confirming 
that the moons are in extremely tight 
orbits, the authors infer new relation- 
ships between the orbital periods of 
satellite pairs. These results may help 
us to understand how planets and 
satellites form and remain on stable 
orbits for billions of years. 

The architecture of Pluto’s small 
satellites closely resembles that of 
several planetary systems discov- 
ered by the Kepler space observatory’ 
(Fig. 2). In these systems, every object 
has a gravitational sphere of influ- 
ence that prevents other objects from 
orbiting nearby. The more massive the 
object, the larger its sphere of influ- 
ence. When the gravitational spheres 


P luto and its large moon Charon together 
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of neighbouring objects nearly overlap, it is 
impossible to place other bodies on stable 
orbits between them. In tightly packed sys- 
tems, the spheres of several (perhaps all) of the 
objects almost overlap. Small particles, such as 
interplanetary dust, might orbit in these inter- 
mediate regions, but large objects cannot. 
These tightly packed systems place severe 
constraints on theories of planetary-system 
formation. According to current thinking, 
planets (and satellites) start as small seeds in 
a disk or ring surrounding the star (or planet) 
at the centre. These seeds grow by agglomerat- 
ing other small solid objects along their orbits. 
Eventually, growing bodies feel the gravita- 
tional tugs of others in the system. Continued 
growth results in ‘overpacking, whereby the 


20,000 km 


Figure 1 | Pluto and its satellites. This optical image, taken by 
the Hubble Space Telescope, depicts Pluto, its large moon Charon 
and four smaller moons Styx, Nix, Kerberos and Hydra. The image 
was taken in July 2012 when Styx was discovered. Showalter and 
Hamilton’ have used such images to derive several properties of 
Styx, Nix, Kerberos and Hydra. The ellipses shown are illustrative 
paths of the moons around the centre of mass of the system. 
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spheres of influence of many growing objects 
in orbit overlap. As the gravitational forces 
between these objects build, their orbital 
motions become chaotic, and further growth 
is promoted through mergers of objects. When 
only a few planets (or satellites) remain, they 
settle into nearly circular orbits and their 
spheres of influence do not overlap. How some 
systems end up with objects in closely packed 
orbits is an open question. 

Current hypotheses** on the formation of 
the Pluto-Charon system focus on a giant 
impact in which a proto-Charon collided 
with a proto-Pluto to form a binary planet 
surrounded by an expanding ring of debris. 
Pre-existing moons might have survived the 
impact and new moons may have grown out 
of small particles in the debris. As well as 
having ended up in tightly packed orbits, the 
four moons that are the end product of this 
process (Styx, Nix, Kerberos and Hydra) exist 
in orbits with orbital periods in an observed 
ratio of roughly 3:4:5:6 times that of Charon’, 
respectively. High-quality measurements of 
the orbits and masses of all the moons in the 
system are needed to understand how this 
process works. 

To constrain these properties, Showalter 
and Hamilton measure precise positions of 
the moons on the HST images. Assuming 
that the four moons follow elliptical orbits 
around Pluto—Charon, the authors 
present detailed modelled fits to their 
positions that yield the period, orien- 
tation (the inclination of the orbital 
plane with respect to the orbital plane 
of Pluto-Charon) and ellipticity of each 
orbit. Variations in the brightness of the 
moons at different times along their 
orbits allowed the authors to derive 
estimates of their sizes, shapes, reflec- 
tivities and masses. They conclude that 
the moons have orbital-period ratios 
of 3.16:3.89:5.03:5.98 — close to, but 
not quite, integers. Curiously, the syn- 
odic period of Styx and Nix (the time 
interval between orbital phases when 
two moons line up on the same side of 
their planet) is almost exactly 1.5 times 
the synodic period of Nix and Hydra. 
How this ‘three-body resonance’ devel- 
oped during the growth of the moons 
is unclear’®. 

The shapes and compositions of 
Pluto—Charon’s four moons provide 
crucial tests of models of planet and 
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Figure 2 | Orbital architecture. The satellite system of Pluto-Charon resembles some of the exoplanet 
systems discovered by the Kepler space observatory. Pluto’s small moons orbit the system’s centre of mass 
clockwise; the exoplanets orbit their respective stars (Kepler 730 and Kepler 2169). For each system, the 
scale is set relative to the orbit of the innermost moon or planet (the relative scales vary across systems; 
the gap between Pluto and Charon is not on the same scale as the orbits of the moons). The dots indicate 
the relative positions of the moons or planets; the circles show their respective gravitational spheres of 
influence. Similarly to the exoplanets, the spheres of influence of Pluto's moons leave little space for other 
potential (as yet undiscovered) objects in intermediate orbits. 


satellite formation®’. Large fragments that 
survived the giant impact, thought to have 
led to the creation of the system, might have 
irregular shapes; satellites grown from much 
smaller particles might be more rounded. The 
authors find that the ellipsoidal shapes of the 
two larger moons, Hydra and Nix, seem more 
consistent with grown satellites than with 
impact fragments. Their optical reflectivity, at 
40%, is similar to Charon’s (36-39%), but lower 


than Pluto's (50-65%, which is comparable to 
the reflectivity of sea ice). With a reflectivity 
of only 4-6%, Kerberos is as dark as coal and 
seems out of place with such bright compan- 
ions. Perhaps it is a dark fragment that was 
ejected during the giant impact. 

It is hoped that NASA’s New Horizons® 
spacecraft, due to fly by Pluto in July, will throw 
yet more light on these questions. Close-up 
images taken by the spacecraft will further 


Opening LOX 
to metastasis 


New findings implicate the enzyme lysyl oxidase (LOX), secreted by oxygen- 
deprived breast cancer cells, in inducing bone lesions that precede and facilitate 
the spread of the cancer cells to the bone. SEE LETTER P.106 


NETA EREZ 


espite extensive research, breast can- 
D cer remains one of the leading causes 

of cancer-related deaths in women, 
and mortality from breast cancer is almost 
exclusively a result of the tumour spreading to 
distant organs. Bones are the most common 
site of metastasis associated with breast cancer, 
affecting up to 80% of women with metastatic 
disease. Bone metastases are typically incur- 
able and encompass severe disease features, 
including pain, bone destruction, hypercalcae- 
mia and debilitating skeletal-related events’. 
In this issue, Cox et al.” (page 106) establish a 
mechanistic link between bone metastasis of 
breast tumours and expression of the enzyme 


lysyl oxidase (LOX) by breast cancer cells. 
Metastases in bones and other organs are 
typically diagnosed months or years after the 
initial diagnosis and removal of the primary 
tumour. This temporal lag is, at least in part, due 
to the fact that although disseminated tumour 
cells have cell-intrinsic survival and prolifera- 
tive programs, they must be able to manipulate 
tissue cells in the new and hostile microenviron- 
ment of the metastatic organ to support their 
growth**. The early molecular changes at the 
metastatic niche are the rate-limiting step of 
metastasis, and understanding the mechanisms 
that facilitate the formation ofa hospitable niche 
is a central challenge in cancer research. 
Hypoxia (lack of an adequate oxygen supply) 
in the primary tumour is generally associated 
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constrain the sizes, shapes and reflectivities 
of Nix, Kerberos and Hydra, but not of Styx 
— it is too small to be resolved in the images. 
The mission’s spectroscopic measurements 
of the relative abundances of various ices will 
probably yield a reflectivity for Styx, and allow 
comparison of the compositions of the satel- 
lites. If new satellites or rings of small particles 
are found, and their bulk properties estab- 
lished, this will provide additional information 
on the extent of the system. These much- 
anticipated observations will lead to improved 
theories of the formation and evolution of 
planets and their satellites. Linking all these 
results to ongoing observations of the growing 
population of known exoplanets will extend 
tiny Pluto’s reach far beyond the Solar System. m 


Scott J. Kenyon is in the Department of Solar, 
Stellar and Planetary Physics, Smithsonian 
Astrophysical Observatory, Cambridge, 
Massachusetts 02138, USA. 
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with increased metastases’. However, when 
Cox and colleagues performed retrospec- 
tive analyses of hypoxic breast tumours from 
humans, they found that hypoxia was corre- 
lated with increased bone metastases only ina 
subtype of breast tumour that does not express 
the receptor for oestrogen (ER tumours). In 
an attempt to identify the factors underlying 
this specificity, Cox et al. analysed the proteins 
secreted by those breast cancer cells that were 
attracted to the bone and found that high levels 
of LOX were associated with bone metastases 
in ER breast tumours. LOX belongs to a fam- 
ily of secreted proteins that crosslink colla- 
gen fibres in the extracellular matrix (ECM), 
which determines the strength and structural 
integrity of tissues®. LOX has been shown to 
contribute to metastasis of breast cancer to 
lungs by modifying the ECM at the metastatic 
niche®”, but it had not previously been impli- 
cated in regulating bone homeostasis. 

Using a transplantable mouse model of breast 
cancer that spontaneously metastasizes to bone, 
the authors demonstrate that LOX is secreted 
by hypoxic breast cancer cells and that it dis- 
rupts the balance between bone formation and 
destruction such that there is greater overall 
bone loss (resorption). These sites of damaged 
bone provide a favourable environment for dis- 
seminated breast cancer cells, thereby facilitat- 
ing the formation of bone metastases. Moreover, 
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THE PROJECT TWINS 


TOOLBOX 


HOW TO CATCHACLOUD 


Why cloud computing is attracting scientists — and advice from 
experienced researchers on how to get started. 


BY NADIA DRAKE 


Howison was preparing to analyse RNA 

extracted from two dozen siphonophores 
— marine animals closely related to jellyfish 
and coral. But the local high-performance 
computer at Brown University in Providence, 
Rhode Island, was not back up to full reli- 
ability after maintenance. So Howison fired 
up Amazon's Elastic Compute Cloud and bid 
on a few ‘spot instances’ — vacant computing 
capacity that Amazon offers to bidders at a 
discounted price. After about two hours of 
fiddling, he had configured a virtual machine 
to run his software, and had uploaded the 
siphonophore sequences. Fourteen hours and 
US$61 later, the analysis was done. 

Researchers such as Howison are increasingly 
renting computing resources over the Inter- 
net from commercial providers such as 
Amazon, Google and Microsoft — and not 


E February, computer scientist Mark 


just for emergency backup. As noted in a 
2013 report sponsored by the US National 
Science Foundation (NSF) in Arlington, 
Virginia, the cloud provides labs with access 
to computing capabilities that they might not 
otherwise have (see go.nature.com/mxh4xy). 
Scientists who need bursts of computing power 
— such as seismologists combing through data 
from sensors after an earthquake or astrono- 
mers processing observations from space 
telescopes — can rent extra capacity as needed, 
instead of paying for permanent hardware. 
Scientists can configure their cloud 
environment to suit their requirements. 
Although cloud computing cannot handle 
analyses that require a state-of-the-art super- 
computer or quick communication between 
machines, it may be just right for projects that 
are too big to tackle on a desktop, but too small 
to merit a high-performance supercomputer. 
And working online makes it easy for teams to 
collaborate by sharing virtual snapshots of their 
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data, software and computing configuration. 

But shifting science into the cloud is not 
a trivial task. “You need a technical back- 
ground. It’s not really designed for an end 
user like a scientist,” says Howison. Although 
the activation energy might be high, there are 
recommended routes for scientists who want to 
try setting up a cloud environment for their own 
research group or lab. 


ADIY GUIDE TO CLOUD COMPUTING 

Most cloud platforms require users to have 
some basic computing skills, such as an under- 
standing of how to work in the command 
line, and a familiarity with operating systems 
and file structures. Once researchers have 
a strong foundation, the next step is to try 
working in a cloud. 

The most user-friendly cloud for scientists, 
says plant biologist Andreas Madlung, could 
be the platform Atmosphere, which was 
created as part of a collaborative cyber > 
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> infrastructure project called iPlant. Funded 
by the NSF and led by three US universities and 
the Cold Spring Harbor Laboratory in Long 
Island, New York, iPlant has been helping 
scientists to share software and run free 
analyses in the cloud since 2008. 

Designed with scientists in mind, the 
platform's interface comes with pre-loaded 
software, a suite of practice data sets and dis- 
cussion forums for users to help each other to 
tackle problems. Madlung, at the University of 
Puget Sound in Tacoma, Washington, teaches 
an undergraduate bioinformatics course that 
includes a section on cloud computing. He first 
introduces his students to the Unix operating 
system, then has them use that knowledge to 
analyse RNA sequence data on Atmosphere. 

Those who sign up with iPlant are automati- 
cally given what equates to around 168 hours 
of processing time a month, and can request 
more if needed. Users can load up virtual 
machines with any extra software that they 
need, and if a job is too much for standard 
equipment to handle, tasks can be offloaded 
to a supercomputer at the Texas Advanced 
Computing Center in Austin, where iPlant has 
a guaranteed allocation. 

Biologist Mike Covington of the University 
of California, Davis, shifted his lab’s com- 
puting work to iPlant after its servers kept 
crashing because they were overloaded. He has 
also made copies (‘images’) of his own virtual 
machine, so that his collaborators — and any 
iPlant user — can log in and access the same 
software, data and computing configuration. 
“If I spend several hours setting up my virtual 
machine perfectly for de novo genome assem- 
bly [reconstructing full-length sequences from 
short fragments of DNA], I can quickly and 
easily make it available to any other scientist 
in the world that wants to do de novo assembly 
with their own data,’ Covington says. 

Such virtual snapshots may become standard 
for projects that require computational work. 
Anyone who wants to reproduce, for example, 
the microbial-genome analysis described in one 
paper can access a snapshot of the authors’ vir- 
tual machine on the Amazon cloud, simply by 
paying for Amazon computing time (B. Ragan- 
Kelley et al. ISME J. 7, 461-464; 2013). 


PICK A CLOUD 

For some researchers, choosing a cloud is 
straightforward. Scientists at CERN, Europe's 
particle-physics laboratory near Geneva, 
Switzerland, have had access to a massive inter- 
nal cloud running on the software platform 
OpenStack since 2013. A handful of institu- 
tions, such as Cornell University in New York 
and the University of Notre Dame in Indiana, 
have developed computing clouds, too. Some, 
including Notre Dame, outsource their clouds 
to companies such as Rackspace Private 
Cloud, a multi-national firm in San Antonio, 
Texas, that sets up and manages cloud services 
for users. But for scientists who are not at an 


CLOUD RESOURCES 


A guide for the perplexed 


© Clouds for researchers: 

The largest commercial providers include 
Amazon’s Elastic Compute Cloud, 
Microsoft’s Azure and Google’s Cloud 
Platform. Other services are Terminal.com, 
aimed specifically at research; the (free) 
Atmosphere cloud platform, from the US 
National Science Foundation-backed iPlant 
collaboration; SageMathCloud; Cornell 
University’s RedCloud; Digital Ocean — 
known for quick deployment of cloud apps; 
and Rackspace — a company that sets up 
clouds using OpenStack, an open-source 
cloud-software platform that the firm 
developed jointly with NASA. 


© Useful resources for cloud explorers: 
StarCluster is a tool developed at the 
Massachusetts Institute of Technology in 
Cambridge that helps to build a virtual 
research-computing cluster on Amazon’s 
platform. Docker is an open-source platform 
that allows researchers to share a snapshot of 


institution with a fully functional campus cloud, 
bushwhacking through the jungle of cloud 
options can bea frustrating adventure (see ‘A 
guide for the perplexed’). Cloud system set-up 
can vary, and proficiency with one provider 
does not guarantee an easy transition to others. 

Casey Dunn, an evolutionary biologist who 
works with Howison at Brown University, pre- 
fers to train students on commercial platforms. 
“When they go on to a postdoc somewhere else 
or start their own lab, they'll still be able to log 
into Amazon,’ he says. 

Somalee Datta, the director of bioinformatics 
at Stanford University’s Center for Genomics 
and Personalized Medicine in California, is 
using Google's cloud platform to support the 
centre’s enormous amount of genomics data and 
computing demand, rather than relying only 
on the servers available at Stanford. She chose 
Google, she says, for several reasons: the com- 
pany’s developers were actively making tools 
available for genomics researchers, Google had 
demonstrated interest in health-care research 
— and the price was right. 


CLOUD CONCERNS 

For Datta and others, one key issue surrounding 
cloud computing is security. “Tt’s a big concern,’ 
she says. “Hackers understand where the value 
is, and they will turn their attention towards 
that.” Still, Datta thinks that clouds are no more 
or less secure than any other computer network. 
A university cloud system, for example, is only 
as solid as the university’s firewall. “If I were 
working on my own or at a small college or 
company, I would probably feel more secure 
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the code, computing environment and data 
used to generate analyses. Project Jupyter 
are shareable notebooks that make data, 
code and analysis easily accessible — and 
interactive (H. Shen, Nature 515, 151-152; 
2014). Nimbus, partly developed by the 
Argonne National Laboratory in Illinois, helps 
to turn a normal computing cluster into a 
cloud system accessible by remote users. 


© Other computing resources: 

Practical Computing for Biologists, by Casey 
Dunn and Steven Haddock (Palgrave 
Macmillan; 2011). 

The Software Carpentry computing 
workshops (see go.nature.com/jg86j)). 

The University of Washington’s eScience 
Institute advice on “Which compute platform 
should | use”? (See go.nature.com/iazoio). 


Links to these resources, including tutorials, 
are available at the online version of this 
article. N.D. 


with Google’s cloud,’ Datta says (although 
Stanford has its own army of engineers watch- 
ing security). The truth is, anyone working 
with extremely sensitive data might be better 
off keeping it away from the Internet altogether. 

Another key issue for researchers who are 
venturing into cloud computing is the level 
of tech support needed. Getting software 
to run on a new system can take days, and 
determining how much computing power or 
memory a virtual machine needs can be an 
exercise in trial and error. All cloud providers 
offer training and tutorials, but dedicated 
support staff are more commonly found at 
universities with campus clouds. 

Despite the challenges, cloud computing is 
increasingly appealing to scientists, says Darrin 
Hanson, vice-president of Rackspace Private 
Cloud. “The last few years have been mostly 
people who are absolutely out on the bleeding 
edge,’ he says. “But now we're starting to see an 
influx of adopters.” 

That isn’t too surprising, Dunn says — the 
cloud is not as foreign as it can sometimes 
sound. “Nearly all consumer computer 
products now have a cloud component, be it 
mobile apps, content-streaming services like 
Netflix or desktop tools like Dropbox,” he says. 
“Research computing is not on the vanguard of 
some crazy and risky unknown frontier — we 
are just undergoing the same transitions that 
are already well under way in industry and the 
consumer marketplace.” = 


Nadia Drake is a freelance science writer in 
San Francisco, California. 
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LET'S HAVE A TALK 


BY XIA JIA (EDITED BY KEN LIU) 


here are few reasons to call a linguist 
after midnight. 


It was three in the morning when 
the phone woke me. A gloomy voice said 
they needed me right now. My first response 
was: Uh-oh, they’ finally here. 
Aliens. 

I met with some odd people 
in an odd dark room, where we 
watched odd video clips: a flock 
of white seal pups huddled 
together, clamouring continu- 
ously, sounding vaguely like 
a zoo mixed with a parking 
garage and a kindergarten. 

“What the hell is that?” Some- 
one beat me to the question. 

We listened to the expla- 
nation. A lab designed these 
intelligent toys, which could 
imitate and learn human lan- 
guages from scratch, as new- 
born babies do. The design 
summary claimed that the seal 
pups could ultimately master the equivalent 
of a five-year-old’s language skills. 

The lab staff had packed a hundred 
prototypes in a container to be shipped to 
beta users; however, the container was mis- 
labelled. When the container was finally 
tracked down, retrieved and opened, the 
staff found that the seals, which ought to 
have been powered down and lying on their 
bellies silently, were instead making an 
astonishing ruckus. 

“It looks like they are talking with each 
other in some alien language we can't under- 
stand,’ an incredulous voice penetrated the 
darkness. 

“That is the very thing we must figure out.” 
A man in black, who was leading this mid- 
night meeting, nodded at us, poker-faced. “Is 
that possible? Who taught them? Remember, 
the container was sealed the entire time” 

“Sealed seals,” I murmured. Luckily no 
one heard me. 

“There was a similar case. ISN, Idioma de 
Sefas de Nicaragua,” the voice in the dark- 
ness replied. “It’s a sign language developed 
by deaf children in a number of schools in 
western Nicaragua in the 1970s and 1980s.” 

“Tell me more.” 


> NATURE.COM Evidently the man in 
Follow Futures: black found this inter- 
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A word to the wise. 


1970s, there was no deaf community in 
Nicaragua. Then a couple of vocational 
schools were established there and hundreds 
of deaf students enrolled. The language pro- 
gramme, which tried to teach students to 
lip-read Spanish words, initially achieved 
little success. Meanwhile, the schoolyard, 


the street and the school bus proved to be 
fertile testing grounds for students figuring 
out how to communicate with each other 
on their own. By combining gestures and 
elements of their individual, idiosyncratic, 
homegrown sign systems, a new type of sign 
language rapidly emerged, which is now 
known as Idioma de Sefias de Nicaragua. 
It is the only time that we've actually seen a 
language being created out of thin air” 

“Not exactly,’ another voice interrupted. 
“Actually, someone later created robots with 
an ability to develop their own language. 
These ‘Lingodroids’ were designed to navi- 
gate their way through a labyrinth and to 
create words for mapped locations using a 
database of syllables. They communicated 
their findings to each other with micro- 
phones and speakers, thereby spawning new 
words for direction and distance as well.” 

“How do we know what the Lingodroids 
were talking about?” said a third voice. “Isn't 
it possible that a word that sounds innocuous 
could mean, for example, ‘armed revolt’?” 

The idea of those simple robots conspir- 
ing should have been funny, but none of us 
laughed. 

“Any more ideas?” The man in black 
looked around. 

“Why seal pups?” I asked loudly. 

“What?” 

“They look weird. Why couldn't you have 
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chosen puppies or kittens?” 

“I don't think that’s important.” He 
shrugged. 

“Maybe the designer wanted them to 
appear as timid and inoffensive as possible,” 
I mused. “Doesn't this imply that we fear 
talking creatures unconsciously?” 

“What's your point?” 

“T mean, why don’t we turn 
off this video screen, walk out 
of this dark room, and talk 
with these... things directly, 
as we believe they've already 
developed their own language? 
All linguists know that the only 
way to learn an unknown lan- 
guage is to communicate with 
a native speaker, to point at 
objects and ask questions, and 
to answer their questions as 
well. We certainly will never 
understand what they are talk- 
ing about if we don’t knock on 
the door of that sealed con- 
tainer and say hello first” 


I stepped through the door, and all the seal 
pups fell silent and watched me with their 
big crystal eyes. Thank God. Seal pups seem 
much better than creatures with teeth and 
claws. I extended both of my hands to show 
that there was no hidden weapon, just as I 
was trained to do in my first field practice, 
knowing full well that this gesture was prob- 
ably meaningless in their linguistic system. 

A ROBOT MAY NOT INJURE A 
HUMAN BEING, ALTHOUGH IT MUST 
PROTECT ITS OWN EXISTENCE. 

So high, so low, so many things to know. 

“URE” I said hello in my mother tongue, 
and waited patiently. 

The nearest seal pup put a fluffy paw in 
my flat palm, and spoke — it sounded like a 
great big yawn. 

I tried my best to imitate it. I could be say- 
ing hello, or else just yawning. Anyway it was 
nota bad start. 

“TL KAT VLDL IE?” I asked gently. Let’s 


have a talk, shall we? m 


Xia Jia is a sci-fi writer in China. Her 
fiction has appeared in English translation 
in venues such as Clarkesworld and The 
Year's Best SF. This is her first story written 
in English and was edited by Ken Liu, a 
translator and speculative-fiction author 
whose works have appeared in F&SF, 
Asimov's, Tor.com and other venues. 
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Workers in southern Italy cut down an olive tree infected with the deadly bacterium Xylella fastidiosa. 


PLANT DISEASE 


Scientists blamed 


for oltve- 


tree ruin 


Italian police investigate researchers’ role in a bacterial 
epidemic that is devastating Puglia’s olive groves. 


BY ALISON ABBOTT 


r | Mhey did not expect to be hailed as 
heroes, say the scientists tasked with 
researching a deadly pathogen that is 

ravaging olive groves in Puglia, southern Italy. 

But they certainly did not predict that they 

would end up feeling like villains. 

In the past year, plant scientists at various 
institutes in Bari, the capital of the Puglia 
region, have seen their work and their 


motivations criticized by local campaigners. 
Most recently, they have been subject to a 
police investigation into whether they are 
responsible for the introduction of the bac- 
terium, Xylella fastidiosa, into Puglia, or for 
allowing its subsequent spread. 

Police have questioned several scientists 
involved in Xylella research and confiscated 
computers and documents from scientific 
institutes. 

“We'd just like to be left to do our work 
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without this suspicion and this stress,” says 
Donato Boscia, head of the Bari unit of Italy’s 
Institute for Sustainable Plant Protection 
(IPSP), whom police questioned in April. 

“The scientists in Puglia working on the 
Xylella outbreak have been working non- 
stop for two years,” adds Rodrigo Almeida, a 
Xylella specialist at the University of Califor- 
nia, Berkeley. “Their reward has been to get 
attacked constantly — I just can’t imagine how 
this would feel” 

Xylella is endemic in parts of the Americas, 
including Costa Rica, Brazil and California, 
but was not previously found in Europe. That 
changed in October 2013, when scientists at 
the IPSP and the University of Bari identified’ 
the bacterium as the cause of an unusual dis- 
ease outbreak in olive trees. The outbreak was 
immediately subjected to European Union 
(EU) regulations to limit its spread, and 
regional scientists began a systematic effort 
to understand the disease and contain it. 
Scientists went on to show that the bacterium 
was being carried by the spittlebug insect”. 

From the start, farmers and environmental- 
ists in Italy objected to containment measures, 
which involved uprooting trees and spraying 
the groves with pesticides. But trouble for the 
Puglian scientists began in April 2014, when 
individuals told police that they suspected the 
epidemic was caused by bacteria that scientists 
had brought from California for a European 
training course on Xylella at the Mediterranean 
Agronomic Institute of Bari (IAMB) in 2010. 

Scientists say that this suggestion is ludi- 
crous because the Puglia strain is different 
from the strains used at the workshop; the 
widely accepted theory is that the infection was 
imported with ornamental plants from Costa 
Rica, where the endemic Xylella strain matches 
the Puglia strain. However, the complaints 
spawned a much broader investigation by public 
prosecutors, including probes into what role sci- 
entists may have had in the epidemic. On 4 May, 
police confiscated computers and documents 
from the University of Bari and the IPSP, as well 
as documents from the Centre for Agricul- 
tural Research Basile Caramia in Locorotondo, 
Puglia. Two weeks later, police also seized docu- 
ments from the Italian ministry of agriculture in 
Rome. The LAMB has voluntarily passed docu- 
ments to police. 

The prosecutors declined Nature’s request 
for comment. But in March, one of them, Elsa 
Valeria Mignone, implied in an interview 
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> with Famiglia Cristiana magazine that they 
are looking into theories that the bacterium 
may have been deliberately introduced into 
the area, or became entrenched because 
agricultural scientists failed to monitor the 
region properly, either deliberately or through 
neglect. 

On 12 May, the Italian Association of Scien- 
tific Societies in Agriculture (AISSA), which 
represents 4,000 scientists in Italy, published 
a public letter defending the Puglian scien- 
tists and their work. “The claims do not have 
a scientific basis — that’s what has shocked the 
scientific community,’ says Vincenzo Gerbi, 
AISSA president. 

Puglian scientists have had to contend with 
public criticism, too. Several popular blogs 
devoted to the Xylella emergency have cast 
doubt on scientists’ ways of working and their 
results — saying, for example, that a cure exists 
but is being suppressed. And Peacelink, an Ital- 
ian non-governmental organization, wrote to 
the EU health commissioner in March say- 
ing that Xylella had not been proved to be the 
source of the outbreak, and that the deaths 
were instead the result of a fungus that could be 


Donato Boscia researches Xylella fastidiosa at 
Italy’s Institute for Sustainable Plant Protection. 


eliminated without destroying trees. An expert 
panel of the European Food Safety Authority 
debunked these suggestions in a report pub- 
lished in April. “It’s frustrating to hear all these 
complaints when you think you are doing a 


public service, says Anna Maria D’Onghia, 
head of the pest-management division at the 
IAMB, who has been questioned by police. 
“We are always being attacked for doing too 
little, or the wrong things.” 

Boscia says that the “attempts to delegiti- 
mize the results of scientific research” have 
been worse than the police investigations. But 
it is not all bad news for Puglian scientists. On 
27 May, the regional government announced a 
€2-million (US$2.2-million) fund for projects 
that might aid the diagnosis, epidemiology 
and monitoring of the bacterium. It said that a 
‘containment area in the province of Lecce — 
where the bacterium is now endemic, making 
complete eradication impossible — will be used 
as an open-air Xylella laboratory. National and 
European research agencies have also promised. 
money, says Boscia. “The outdoor laboratory 
would be perfect for all of us — and also allow 
critics to put their own theories to the test? = 


1. Saponari, M., Boscia, D., Nigro, F. & Martelli, G. P. 
J. Plant Pathol. http://dx.doi.org/10.4454/JPP. 
V9513.035 (2013). 

2. Elbeaino, T. et al. Phytopathol. Mediterr. 53, 328-232 
(2014). 


DONATO BOSCIA 


POLITICAL SCIENCE 


Retracted gay-marriage study 
debated at misconduct meet-up 


Over rum cocktails at the World Conference on Research Integrity, experts discussed what 
can be learnt from the fallout of a flawed political-science paper. 


BY RICHARD VAN NOORDEN, RIO DE JANEIRO 


he world’s largest gathering of specialists 
| in research misconduct kicked off on 
31 Mayin Rio de Janeiro, Brazil, shortly 
after science’s latest scandal broke. On the 
evening before the start of sessions on how to 
diagnose and remedy ethical faults in research, 
delegates to the 4th World Conference on 
Research Integrity sipped caipirinhas, Brazil's 
national cocktail — and swapped views on 
what could be gleaned from a flawed political- 
science study. 

The paper in question, which claimed to 
show that short conversations with a canvasser 
who is gay could encourage voters to support 
same-sex marriage, made headlines across the 
world when it was published in Science last 
December (M. J. LaCour and D. P. Green Sci- 
ence 346, 1366-1369; 2014) — and again when 
it was retracted last week (Science http://doi. 
org/4zt; 2015). “The case is very much on our 
minds,” said Melissa Anderson, a co-organizer 
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of the meeting who studies scientific integrity 
at the University of Minnesota in Minneapolis. 

Although the case throws up new instances 
of misconduct, and of inadequate supervision 
by senior academics, delegates to the Rio con- 
ference felt that, in general, the case illuminated 
little about the academic system that a steady 
drip-drip of research misconduct has not already 
highlighted. The main challenge, said Brian 
Martinson, a social scientist at the HealthPart- 
ners Institute for Education and Research in 
Minneapolis, is how to create a supportive envi- 
ronment that incentivizes reliable, reproducible 
research. “A lot of people think the bad stuff in 
science comes from academics being greedy or 
narcissistic — but that ignores how the structural 
arrangements in science, like the decline of fund- 
ing and stable academic positions in the United 
States, leads people into bad behaviour,’ he said. 

In the latest twist in the debacle, co-author 
Michael LaCour, a graduate student in political 
science at the University of California, Los Ange- 
les (UCLA), has admitted to misrepresenting 
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his funding sources and the incentives he used 
to attract people to take part in the study. Ina 
29 May online reply to researchers who had 
spotted irregularities in his survey data (see 
go.nature.com/acpxnh), LaCour said that he 
had deleted his raw data for reasons of confi- 


dentiality and admit- 
“Academia ted that he did not get 
should be ethical approval from 
concerned an institutional review 
that its system board before he did 
of checks and the work, or before he 
balances has submitted it to Science. 
problems.” The document did not 


include convincing 

evidence that he had conducted the surveys. 
LaCour told The New York Times that he 
stands by his finding — but his co-author 
Donald Green, a political scientist at the Uni- 
versity of Columbia in New York City, does not: 
Green requested the paper’s retraction after 
three outside scientists told him about irregu- 
larities in its survey data, and he apologized for 
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Ancient proteins resolve the evolutionary history of 
Darwin’s South American ungulates 
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No large group of recently extinct placental mammals remains as _ sequences. For each ungulate, we obtain approximately 90% direct 
evolutionarily cryptic as the approximately 280 genera grouped as__ sequence coverage of type I collagen a1- and a2-chains, representing 
‘South American native ungulates’. To Charles Darwin’”, who first approximately 900 of 1,140 amino-acid residues for each subunit. A 
collected their remains, they included perhaps the ‘strangest animal[s] _ phylogeny is estimated from an alignment of these fossil sequences 
ever discovered’. Today, much like 180 years ago, it is no clearer with collagen (I) gene transcripts from available mammalian genomes 
whether they had one origin or several, arose before or after the or mass spectrometrically derived sequence data obtained for this study. 
Cretaceous/Palaeogene transition 66.2 million years ago’, or are The resulting consensus tree agrees well with recent higher-level 
more likely to belong with the elephants and sirenians of superorder mammalian phylogenies’. Toxodon and Macrauchenia form a 
Afrotheria than with the euungulates (cattle, horses, and allies) of | monophyletic group whose sister taxon is not Afrotheria or any 
superorder Laurasiatheria* °. Morphology-based analyses have proved _ of its constituent clades as recently claimed**, but instead crown 
unconvincing because convergences are pervasive among unrelated _Perissodactyla (horses, tapirs, and rhinoceroses). These results are 
ungulate-like placentals. Approaches using ancient DNA have also consistent with the origin of at least some South American native 
been unsuccessful, probably because of rapid DNA degradation ungulates** from ‘condylarths’, a paraphyletic assembly of archaic 
in semitropical and temperate deposits. Here we apply proteomic _placentals. With ongoing improvements in instrumentation and 
analysis to screen bone samples of the Late Quaternary South analytical procedures, proteomics may produce a revolution in 
American native ungulate taxa Toxodon (Notoungulata) and _ systematics such as that achieved by genomics, but with the possibility 
Macrauchenia (Litopterna) for phylogenetically informative protein of reaching much further back in time. 
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Figure 1 | Samples used in this investigation. a, Predicted survival of an the sequenced Pleistocene SANUs are high compared with coeval horse 


80-base-pair (bp) DNA fragment after 10,000 years (10 ka) modelled using (MACN Py 5719) as well as modern hippopotamus and tapir, providing 
the rate given in ref. 29. b, Location of finds by Darwin'” and ofsamples usedin _ support for the authenticity of the ancient sequences (see Supplementary 
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South American native ungulates (SANUs) are conventionally orga- 
nized into five orders (Litopterna, Notoungulata, Astrapotheria, Xen- 
ungulata, and Pyrotheria) that are sometimes grouped together as a 
separate placental superorder (Meridiungulata)’®. They appear very early 
in the Palaeogene record and evolved thereafter along many divergent 
lines, as their abundant fossil record attests. Most lineages had become 
extinct by the end of the Miocene epoch, although a few species of lito- 
pterns and notoungulates persisted into the Late Pleistocene epoch. 
Despite continuing interest in their evolutionary history (for example 


refs 5, 11-14), phylogenetic relationships of the major SANU clades to 
one another and to other placentals remain poorly understood (see 
Supplementary Information). Although some recent investigations (for 
example refs 4-6) have suggested that basal South American members 
of Litopterna conclusively group with certain Holarctic condylarths, 
and are thus best placed in Euungulata (Laurasiatheria), several other 
studies claim to have identified potential synapomorphies linking var- 
ious SANU taxa with Afrotheria**’*”*. This latter view is broadly con- 
sistent with such indicators as prolonged late Mesozoic faunal exchange 
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Figure 2 | Relationship of Toxodon (Notoungulata) and Macrauchenia 
(Litopterna) to other placental mammals. Fifty per cent majority rule 
Bayesian consensus tree of COL1 protein sequence data, with chicken (Gallus) 
as outgroup. Scale bar indicates branch length, expressed as the expected 
number of substitutions per site. Major clades (orders and superorders) are 
colour coded; species names in bold indicate collagen sequences derived from 


82 | NATURE | VOL 522 | 4 JUNE 2015 


MS/MS rather than genomic data, fossil taxa depicted in silhouette. Inset: in 
all tree-reconstructions conducted (see Supplementary Information), Toxodon 
and Macrauchenia (dark grey) group monophyletically at the base of 

crown Perissodactyla (light green) with 100% posterior probability, 

forming Panperissodactyla. 
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between Gondwanan landmasses” and the possibility that Xenarthra 
(the other major endemic South American placental clade) is also related 
to Afrotheria”~*!®. However, most of the character evidence on which 
the SANU-Afrotheria sister-group hypothesis is based is in dispute’?”®°. 
In principle, a more definitive test of phylogenetic affinities could come 
from genomic data, but so far the application of ancient DNA tech- 
niques has been limited and DNA survival is predicted to be poor 
(Fig. 1a) (see Supplementary Information). 

Type I collagen (COL1), a structural protein comprising two separate 
chains, COL1a1 and COL1a2 (coded by genes on separate chromo- 
somes), is known to provide useful systematic information (“barcod- 
ing’)”', and can be recovered over significantly longer time spans than 
DNA”. Most of the 48 samples of Toxodon sp. and Macrauchenia sp. 
we analysed for sequence information came from localities in Buenos 
Aires province (Supplementary Information and Fig. 1b), especially 
from areas that experience subtropical to maritime-temperate climates”. 
Peptide mass fingerprinting (ZooMS) (Supplementary Information) of 
COLI extracts” revealed variable levels of collagen preservation in the 
sample set (see Supplementary Information and Extended Data Table 1). 
After screening, two samples each of Toxodon and Macrauchenia dis- 
playing excellent COL1 preservation (see Extended Data Fig. 1) were 
selected for liquid chromatography-tandem mass spectrometry (LC- 
MS/MS) sequencing using a variety of LC-MS/MS platforms, and direct 
radiocarbon dating (Supplementary Information and Extended Data 
Table 1). 

Combining analyses from a total of eight MS/MS runs, we were able 
to assemble near-complete COL1 sequences for Macrauchenia (89.4%) 
and Toxodon (91.0%), similar to levels of sequence coverage for modern 
samples. Comparative analyses with fossil and modern samples suggest 
that our SANU COL] sequences are authentic: COL1 amino-acid se- 
quence variation is located in similar positions along both COL1 chains 
compared with collagen sequences derived from genomic sources (Ex- 
tended Data Fig. 2) and deamidation ratios conform to expectations for 
Pleistocene samples (Extended Data Fig. 3), a criticism of previous pre- 
Holocene collagen studies”’. Independent manual de novo sequencing 
of product ion spectra for selected phylogenetically relevant peptides 
was in full agreement with sequence assignments from database searches. 
Furthermore, 86.70% and 94.41% of the assembled species consensus 
sequences for Macrauchenia and Toxodon, respectively, were covered 
by a minimum of two independent product ion spectra, with individual 
positions being covered by an average of 77.1 (for Macrauchenia) and 
103.9 (for Toxodon) product ion spectra (Extended Data Table 2). 

Molecular evidence for the phylogenetic placement of the extinct 
SANUs Macrauchenia and Toxodon was previously unavailable. To ex- 
amine the phylogenetic position of these taxa, an alignment of 76 mam- 
malian COLI sequences and one outgroup (Gallus) was constructed 
from available mammalian genomic COL] sequences in GenBank, as 
well as several MS/MS- derived protein sequences obtained for this study. 
A Bayesian phylogenetic tree was estimated from the data, with sepa- 
rate models of substitution applied to two partitions (COLIa1 and 
COL1a2). The resulting consensus tree (Fig. 2) is based solely on pro- 
tein sequence data, but its topology corresponds closely to branching 
relationships in Placentalia recovered in recent molecular studies’. 
Furthermore, nodes poorly supported in this study (for example, those 
within Laurasiatheria) involve the same series of phylogenetic relation- 
ships that have proved difficult to resolve in other studies””°. To exam- 
ine how alternative topologies could affect the position of our target 
taxa we ran additional Bayesian analyses, using constraints mirroring 
differing mammal phylogenies (Extended Data Fig. 4 and Supplemen- 
tary Information). 

In all phylogenetic analyses performed with our data (including the 
use of unconstrained parsimony and probabilistic tree reconstruction 
methods), Macrauchenia and Toxodon formed a strongly supported 
monophyletic pair that grouped exclusively with Perissodactyla (as re- 
presented by extant Equus, Tapirus, and Ceratotherium). Neither showed 
any association with the clades conventionally contained in Afrotheria 
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(see Supplementary Information). In future, and with more evidence, 
it may be appropriate to include these SANUs within an augmented de- 
finition of Perissodactyla. At present, we prefer to recognize Litopterna 
and Notoungulata as part of a branch-based rankless taxon Pan- 
perissodactyla, uniting all taxa more closely related to crown Perisso- 
dactyla than to any other extant taxon of placentals (see Supplementary 
Information). 

Despite poor resolution at the base of Laurasiatheria, the fact that 
Macrauchenia and Toxodon were not recovered at a basal position 
within Euungulata would imply that the initial split between Perisso- 
dactyla and Artiodactyla occurred earlier than the origin of the SANU 
clades. Since fossil evidence indicates that both litopterns and notoun- 
gulates were already present in South America by the Early Palaeocene 
epoch*”’, this would suggest that the divergence events leading to the 
modern orders must have occurred at, if not before, the Cretaceous/ 
Palaeogene boundary (Supplementary Information and Extended Data 
Fig. 5). 

These observations do not constitute a full molecular test of SANU 
monophyly, as there is no proteomic evidence available for members of 
the remaining orders (Astrapotheria, Xenungulata, Pyrotheria). As far 
as it is now known, Xenungulata and Pyrotheria became extinct in the 
Late Palaeogene, but some members of Astrapotheria (sometimes con- 
sidered the sister group of Notoungulata”’) persisted until the Middle 
Miocene (16.0-11.6 million years ago (Ma) (ref. 28)). This is well be- 
yond the extrapolated estimate of less than 4.0 Ma for good collagen 
survival in an optimal (cool) burial environment”, although the empir- 
ical limits on collagen survival under differing environmental conditions 
are poorly understood at present (see Supplementary Information). 

The results presented here establish that, in principle, the approxi- 
mately 2,100 residues (that is, one-fifth of the amino-acid residues ana- 
lysed in ref. 9) comprising bone COLI in placental mammals are 
sufficiently variable to provide reliable systematic information. Of course, 
a phylogeny based on two genes may be sensitive to factors affecting 
phylogenetic resolution such as gene lineage sorting, missing taxa, ab- 
errant molecular rates, and selection acting on protein coding sequences. 
Despite this, the topology derived from the collagen sequences in this 
study is in broad agreement with other mammalian trees, and supports 
monophyletic placement of two Late Quaternary SANUs with a high 
degree of confidence. Reliable systematic information is an essential 
foundation for many other enquiries in evolutionary biology, includ- 
ing patterns of early Cenozoic mammalian divergence, radiation, ex- 
tinction, and palaeobiogeography. With further development, molecular 
sequencing of degradation-resistant proteins such as bone COL] is sure 
to open new vistas in the study of vertebrate evolution. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Zooarchaeology by MS screening. After zooarchaeology by MS (ZooMS) screen- 
ing of selected Macrauchenia (n = 26) and Toxodon (n = 22), four bone specimens 
were selected for MS/MS analysis. Using a combination of enzymes, we were able 
to obtain sequence coverage of around 90% for COLI for both genera. Subsamples 
of about 200 mg were taken from each bone or skin sample for COLI extraction. 
Bone samples were demineralized in 0.6 M HCl for 8 days at 4 °C. The acid was 
removed and the samples were washed three times with ultrapure water then heated 
at 70°C in 0.6 M HCI for 48 h to gelatinize the COL1. Samples were then ultra- 
filtered using 30 kilodalton filters and washed through with ultrapure water. Halfa 
millilitre from each sample retentate was taken to dryness overnight in a vacuum 
centrifuge. One hundred microlitres of 50 mM ammonium bicarbonate solution 
(pH 8) was added to each sample. The samples were then digested with trypsin 
(0.5 pg wl, for 16 hat 37 °C). After enzyme digestion, samples were acidified with 
2 wl of 5% (volume %) trifluoroacetic acid (TFA). Samples were then concentrated 
using C18 ZipTips: the ZipTips were prepared using a conditioning solution of 
50% acetonitrile, 49.9% water, 0.1% TFA; the tips were then washed with a washing 
solution of 0.1% TFA; the sample was then transferred over the column ten times; 
the tips were then washed again using 0.1% TFA solution; finally the sample was 
eluted using the conditioning solution. For ZooMS analysis, 1 ,1l of each sample 
was spotted in triplicate onto a ground steel plate with 1 ul of CHCA matrix solu- 
tion (1% in 50% ACN/0.1% TFA (v/v/v)). MS analysis was on a Bruker ultraflex 
matrix-assisted laser desorption/ionization-tandem time of flight (MALDI-TOF/ 
TOF) mass spectrometer over the m/z range 800-4,000 (Extended Data Fig. 1). 
Screening revealed large differences in COLI spectral quality between samples. Of 
46 SANU samples, only five (3 out of 20 from Toxodon, 2 out of 25 from Macrau- 
chenia) yielded good ZooMS spectra. One of the three Toxodon samples (ZMK 22/ 
1889) produced a few strong MS/MS spectra and only four samples (two each from 
Macrauchenia and Toxodon) were used in the main study. 

MS/MS sequence analysis. Selected collagen extracts from pooled trypsin (0.4 pg pl, 
16h, 37 °C) and elastase digests (0.8 ig pl 1 16h, 37 °C) of two specimens of each 
SANU sample were analysed on both Thermo Scientific Orbitrap and Bruker 
maXis HD LC-MS/MS platforms. Additionally, Orbitrap and maXis HD instru- 
ments were also used for sequencing collagen from modern aardvark (Orycteropus 
afer), silky anteater (Cyclopes didactylus), hippopotamus (Hippopotamus amphi- 
bius), and South American tapir (Tapirus terrestris), as well as Pleistocene Mylodon 
darwinii and Equus sp. samples from South America. 

Hybrid Quadrupole-Orbitrap. Sample separation was performed on an Ultimate 
3000 RSLCnano LC system (Thermo Scientific). Peptides were first trapped on a 
Pepmap j-pre-column (0.5 cm X 300 tum; Thermo Scientific) and separated on an 
EASY Spray PepMap UHPLC column (50 cm X 75 jum, 2 um particles, 40 °C; Thermo 
Scientific) with a 60 min multi-step acetonitrile gradient ranging from 2% to 35% 
mobile phase B (mobile phase A: 0.1% formic acid/5% dimethylsulfoxide (DMSO) 
in water; mobile phase B: 0.1% formic acid/5% DMSO in acetonitrile) at a flow rate 
of 250 nl min” '. Mass spectra were acquired on a Q Exactive Hybrid Quadrupole- 
Orbitrap mass spectrometer at a resolution of 70,000 at m/z 200 using an ion target 
of 3 X 10° and maximal injection time of 100 ms between m/z 380 and 1,800. Pro- 
duct ion spectra of up to 15 precursor masses at a signal threshold of 4.7 x 10* counts 
and a dynamic exclusion for 27 s were acquired at a resolution of 17,500 using an ion 
target of 10° and a maximal injection time of 128 ms. Precursor masses were iso- 
lated with an isolation window of 1.6 Da and fragmented with 28% normalized 
collision energy. 

Bruker maXis HD. Sample separation was performed on an Ultimate 3000 
RSLCnano LC system (Thermo Scientific). Peptides were first trapped on a Pep- 
map pre-column (2 cm X 100 um; Thermo Scientific) and separated on a PepMap 
UHPLC column (50 cm X 75 um, 2 jum particles; Thermo Scientific) with a 120 min 
multi-step acetonitrile gradient ranging from 5 to 35% mobile phase B (mobile phase 
A: 0.1% formic acid in water; mobile phase B: 0.1% formic acid in acetonitrile) at a 
flow rate of 400 nl min”. A CaptiveSpray nanoBooster source (Bruker Daltonik), 
with acetonitrile as a dopant, was used to interface the LC system to the maXis HD 
UHR-Q-TOEF system (Bruker Daltonik). Source parameters were set to 31 min ! 
dry gas and 150°C dry heater; nitrogen ‘flow’ setting for the nanoBooster was set 
to 0.2 bar. Mass spectra were acquired in the m/z range 150-2,000 at a spectral 
acquisition rate of 2 Hz. Precursors were fragmented with a fixed cycle time of 4s 
using a dynamic method adapting spectra rates between 2 and 10 Hz based on 
precursor intensities. Dynamic exclusion was set to 0.4 min combined with recon- 
sideration of an excluded precursor for fragmentation if its intensity rose by a fac- 
tor of 3. 

Collagen type I sequence assembly. Product ion data from the maXis HD and 
Orbitrap platforms were analysed in three stages. Initially MASCOT (Matrix Science) 
was used to search against the UniColl database, a database of non-redundant 
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synthetic collagen peptides, to generate a list of ranked peptides for each spectrum. 
Sequences derived from this exercise were added to a local database of genomic 
and published collagen sequences and common laboratory contaminants, and the 
original data were then re-analysed by PEAKS”' using this new database (for an 
example of PEAKS output see Extended Data Fig. 1b-d). 

As an independent check, a limited number of the product ion spectra of pep- 

tides (previously assigned by PEAKS) were also manually de novo interpreted (by 
J.T.-O.) without prior knowledge of the assignment, in all cases with full agreement 
between the two approaches. 
Generation of, and searching against, Unicoll. Publicly available COL1a1 and 
COL1a2 sequences were concatenated and aligned using Mafft’? with subsequent 
manual alignment of misaligned sites using Bioedit and Geneious version 4.6 (ref. 33). 
A custom Python script was used to digest the COL1 with trypsin in silico. For each 
tryptic fragment, all variable amino-acid positions across the aligned sequences were 
recorded. A new sequence was created for every permutation of these variable sites. 
These sequences were concatenated and stored in FASTA format with a header 
indicating the position in the original alignment. The result was a database with 
each entry a concatenation of sequences representing every permutation of observed 
mutations for that particular tryptic fragment. One tryptic fragment of the sequence 
(COL1a2 positions 870-905) was too variable to include without exceeding avail- 
able memory. Only the original observed variants were included for this part of the 
sequence. Using this strategy, it was possible to generate the equivalent of more 
than 10° alternative collagen ‘sequences’ (cf. 10°”, which is the upper estimate of 
the number of atoms in the universe). 

MS/MS data files were merged and submitted to Mascot with enzyme set to 
Trypsin/P; variable modifications for deamidated (NQ), Lys—Hyl (K), oxidation 
(M), and Pro—Hyp (P); peptide mass tolerance +10 ppm; and fragment mass tol- 
erance +0.07 Da. The structure of sequence entries in Unicoll meant that it could 
not accommodate missed cleavages. Select summaries containing matched peptides 
with a Mascot score greater than 30 were exported into Microsoft Excel for each 
analysis. Peptides were identified by picking the highest scoring hits for each tryptic 
fragment, if the score exceeded 40; whereas for matches with scores between 30 and 
40, the spectra were inspected manually to choose the best hit among the possibil- 
ities given by the search engine. 

Searching data using PEAKS. Product ion spectra were searched using PEAKS 
software against a database comprising genomic COL1 sequences plus fossil con- 
sensus sequences, composed of UniColl peptide hits, with missing and low coverage 
regions filled with conserved mammalian COL1 sequences (see Phylogenetic re- 
construction section below). Additionally, common laboratory contaminants were 
included in database searches. Full PEAKS searches (Peptide de novo, PEAKS DB, 
PEAKS PTM, and SPIDER) were performed with peptide mass tolerance +10 ppm 
and fragment mass tolerance +0.07 Da, in addition to respective platform and en- 
zyme details. Searches were performed allowing for deamidated (NQ), Lys—Hyl 
(K), oxidation (M), and Pro>Hyp (P). False discovery rate was put at 0.5% and 
peptide scores were only accepted with — 10logio(P value) scores of at least 30 and 
average local confidence (%) at least 50. Where there was ambiguity in interpreta- 
tion of the spectra, peptides were selected on the basis of knowledge of sequence 
constraints, post-translational modifications, and fragmentation patterns. 
Reference sequence authentication. To check the quality of our MS/MS COL1 
sequences, we sampled a modern and a fossil sample for which we had independ- 
ent genomic data, specifically (1) a modern aardvark sample (Orycteropus afer) 
and (2) a fossil equid bone from a geological formation rich in SANU fossils with 
their respective genome sequences. The fossil sample had similar collagen yields 
and ZooMS profile to the SANU samples used for MS/MS sequencing (Pleistocene 
horse, Tapalqué, South America; Fig. 1b) (MS/MS sequence analysis, above). Our 
modern aardvark MS/MS sequence was identical to that of the protein product in- 
ferred from the released genomic sequence. For the Pleistocene Equus sp. sequence, 
two amino-acid substitutions were detected (T>L, COL1a1; H>D, COL142), sim- 
ilar to the maximum number of differences observed in a recent study comparing 
Equus genomes with the Equus ferus caballus reference genome”. 

De novo sequence authentication. The absence of corresponding genomic data 
prevented similar comparisons with MS/MS- derived sequences for the SANU species. 
Instead we assessed amino-acid substitution locations along COL1o1 and COL102 
chains both in our (and previously published”’) and in fossil COL1 sequences with 
genomic data, using the COL1o1 and COL1a2 sequence of the Tasmanian devil 
(Sarcophilus harrisii) as an outgroup to eutherian mammals. Carboxy- (C-) and 
amino- (N-)terminal telopeptides were removed as they were rarely observed from 
fossil samples. COLI position numbers are given as a continuous count with COL1a1 
and COL1«2 concatenated, with COL101 ranging from position 1 to position 1014, 
and COL1«2 ranging from 1015 to 2028. 

We found that the location of amino-acid variation along the COL1a1 and 
COL1o2 chains was similar among the different COL1 sequences obtained from 
genomic sources (Extended Data Fig. 2). We identified several regions, mainly 
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located in COL101, that appeared to lack sequence variation among the four major 
mammalian superorders. This could be a result of the functional importance of 
some of these regions during COL] fibril formation, «1 and «2 chain binding, and 
COLI hydroxylation****. Additionally, we observed a substitution rate in COL102 
roughly twice that observed in COL1a1. 

Comparing COL1 sequences derived from MS/MS data in this and an earlier 
study” with genomic data for laurasiatheres revealed good correspondence in the 
location of substitutions along the COL1a1 and COL102 chains between our results 
and genomic data (Extended Data Fig. 2). The MS/MS data in ref. 35 for laura- 
siatheres were derived from a single species (Manis tetradactyla). Sequence vari- 
ation from those data compared with genomic data were similar, although we noted 
that several regions displaying high rates of amino-acid substitution were missing 
from the Manis consensus sequence provided (notably around positions 726-756, 
991-1089, 1306-1364, 1423-1443 and 1899-1977). 

Reference 35 provided two xenarthran and five afrotherian COL1 sequences 
obtained using mass spectrometric sequencing. Regions with high substitution 
complexity were missing from the consensus sequences provided, for Afrotheria 
(1024-1089, 1588-1599, 1740-1754) and Xenarthra (1024-1089, 1207-1234, 
1348-1364, 1588-1638, 1771-1806, 1921-1947). The absence of such regions pro- 
hibited the inclusion of these sequences in our phylogenetic tree-building, as the 
majority of informative positions were missing from the sequences provided. For 
substitution locations, our data suggest structural and/or functional organization 
of these, and their frequency, in specific regions of both chains. 

We criticized claims of authentic collagen sequences retrieved from a Tyranno- 
saurus rex sample” based in part on the low levels of reported deamidation”, and 
more recently have demonstrated an increase in glutamine deamidation in archae- 
ological rather than modern collagen, which correlated with thermal age (Extended 
Data Fig. 3 and ref. 41); similar levels have been reported for Pleistocene mammoths 
and equids™*. 

Deamidation ratios observed for glutamine here are consistent with ancient col- 
lagen of equivalent thermal age (Extended Data Fig. 1). The lowest levels of Gln to 
Glu deamidation are observed in modern samples from hippopotamus (1.8% + 3.2) 
and tapir (5.7% + 10.9) bone. The highest levels of Gln deamidation occur in the 
radiocarbon samples from dead Macrauchenia (Glu = 82.8% + 14.3). The Toxodon 
samples are less deamidated (Glu = 59.2% + 24.5), which is consistent with a Late 
Pleistocene date (12,000 years ago). However, by contrast, the Pleistocene equid is 
much better preserved (Glu = 18.9% + 18.4), despite the fact that it cannot be much 
younger than Toxodon (Fig. 1c). 

DNA extraction and sequencing. Approximately 250 mg of the three samples 
with the highest number of peaks in the mass spectra from each species (see Zoo- 
archaeology by MS screening, above) were used for DNA extraction. DNA extrac- 
tion was performed as in the method described in ref. 42. PCR primers were designed 
to target Perissodactyla- and Laurasiatheria-specific regions of the cytb, COX1, 16S, 
and 12S genes using mitochondrial DNA sequences downloaded from the National 
Center for Biotechnology Information (NCBI) (Supplementary Table 1). Primer 
design used the program Primer3. PCR was performed for 60 cycles and samples 
were visualized on 2.5% agarose gel. Products were successfully amplified from 
several samples whereas PCR controls showed no amplification products. BLAST 
searches of the sequences obtained revealed no homology to any previously derived 
sequence for several of the products, whereas sequences from two Macrauchenia 
samples showed high similarity (98% and 99%, respectively) to domestic pig se- 
quences, a common contaminant in ancient DNA analyses*’. A Pleistocene horse 
bone from the same depositional context as some of the SANU specimens yielded 
a sequence 98% identical to modern horse (Equus caballus), suggesting that the 
failure to amplify putative SANU DNA sequences by PCR was not because of tech- 
nical problems, but because of a lack of endogenous DNA in the samples investigated. 
DNA next-generation sequencing approach. After failing to amplify endogen- 
ous DNA through Sanger sequencing of targeted PCR products, we applied a non- 
targeted, next-generation sequencing (NGS) shotgun approach ina further attempt 
to identify whether endogenous DNA could be obtained. Based on the collagen 
sequencing results, Macrauchenia sample 12-1641 (metapodial) was selected as the 
most likely candidate for NGS analyses. DNA extractions of Macrauchenia sample 
12-1641 followed protocols described in ref. 44 and were performed in the ded- 
icated ancient-DNA laboratory at Royal Holloway, University of London, UK. The 
library was constructed in a dedicated laboratory for ancient DNA (Johannes Gu- 
tenberg University, Mainz, Germany) using a modified version of the protocol in 
ref. 45. Modifications were as follows: the initial DNA fragmentation step was not 
required, and all clean-up steps used MinElute PCR purification kits. For the blunt- 
end repair step, Buffer Tango and ATP were replaced with 0.1 mgml~’ BSA and 
1X T4DNA ligase buffer. The proceeding clean-up step was replaced by an inac- 
tivation step, heating to 75 °C for 10 min. For the adaptor ligation step, 0.5 mM ATP 
replaced the T4 DNA Ligase buffer. The index PCR step followed a further pro- 
tocol** using AmpliTaq Gold DNA polymerase and the addition of 0.4mg ml! 


BSA. The index PCR was set for 20 cycles with three PCR reactions conducted per 
library. The indexed library was sequenced on an Illumina HiSeq platform (Mainz) 
using a single lane, paired-end read, sequencing run. 

Bioinformatics methods and conclusion. Paired-end reads were quality trimmed 
(q = 10) with cut-adapt” and then sequences were simultaneously adaptor trimmed 
and the paired reads joined together with Seq-Prep (available from https://github. 
com/jstjohn/SeqPrep). Reads shorter than 17 base pairs were discarded. In the ab- 
sence of any close phylogenetic relative (required for the accurate genomic map- 
ping of reads), processed reads were de novo assembled into contigs using clc_ 
denovo_assembler (available in CLC Assembly Cell version 4.2), with contigs shorter 
than 70 base pairs discarded. Two approaches were then used to investigate the 
data for mammalian genomic sequences (which had proved successful for other 
ancient DNA NGS samples). 

To examine whether there were any mammalian DNA sequences suitable for 
phylogenetic analysis in our data set, first, contigs were blasted using blastn to a local 
nucleotide database, downloaded from NCBI. Custom perl scripts (available on 
request) were used to assign taxonomic and gene information to BLAST hits. These 
results were searched for standard orthologous mitochondrial and nuclear phylo- 
genetic sequences. Each of the potential hits blasting to mammalian sequences was 
inspected; however, all were assignable to bacterial elements, and no blast hit could 
be attributed to mammalian genes. 

Second, two separate BLAST databases were generated: one from the contigs and 

a second from the processed reads, using the makeblastdb command in BLAST. 
These databases could then be queried with mammalian (including perissodactyl) 
mitochondrial and nuclear phylogenetic sequences of interest using blastn. Neither 
of these searches returned any matching contigs. Thus, the NGS data set yielded 
nothing of use for phylogenetic analysis, and gave no indication that any Macra- 
uchenia DNA had persisted in the sample. 
Phylogenetic reconstruction. Before the advent of DNA-based molecular phylo- 
geny, variations in protein structure and sequence had been used to explore evo- 
lutionary relationships**’. The comparative data set for this paper was built using 
consensus amino-acid sequences for COL1a1 and COL1a2 generated by MS/MS 
analysis for the target taxa Toxodon and Macrauchenia as well as representatives of 
all extant major mammalian clades. Leucine (L) and isoleucine (I) were converted 
into isoleucines as these are isobaric and low-energy MS/MS sequencing is not cap- 
able of discriminating between them. Partition Finder” was used to select the best-fit 
partitioning scheme from the amino-acid data. This was identified as two separate 
partitions, for Collal and Colla2. Bayesian phylogenies were generated using 
MrBayes version 3.2.1 (ref. 51) with the amino-acid model estimated from the 
data (to allow model jumping between fixed-rate amino-acid models, the prior for 
the amino-acid model was set as prset aamodellpr = mixed). The proportion of 
invariant sites, and the distribution of rates across sites (approximating to a gamma 
distribution), were also estimated from the data. Two chains were run for 5 million 
generations (sampled every 500), with convergence between chains assessed in 
Tracer version 1.6 (ref. 52). All effective sample sizes of parameters were greater than 
100. After burn-in was removed, a majority rule consensus tree was constructed, 
using the sumt command in MrBayes, from the trees sampled in the posterior 
distribution. 

To test for the robustness of the results of the Bayesian analysis under other me- 
thods of tree reconstruction, we also conducted maximum likelihood and max- 
imum parsimony analyses. We performed parsimony analyses running PAUP* 
version 4.0b10 (ref. 53), using the heuristic search option with a random taxon 
addition sequence (1,000 repetitions) and TBR branch swapping, and rooting the 
tree along the branch leading to Aves. A maximum likelihood phylogeny was esti- 
mated in RAxML version 7.0.4 (ref. 54). A Dayhoff model of protein sequence evo- 
lution with gamma-distributed variation in rates across sites (corresponding to the 
PROTGAMMADAYHOFFF model in RAxML) was applied to each partition. 
Twenty separate maximum likelihood analyses were performed (using the “—f 
d@ command in RAxML), and the tree with the highest likelihood was chosen from 
this set. 

Molecular clock analysis. Fossil-calibrated phylogenies were constructed in BEAST 
version 1.7 (ref. 55) with the Dayhoff amino-acid model (chosen under the MrBayes 
mixed model) together with the proportion of invariant sites and the distribution 
of rates across sites (approximating a gamma distribution) applied to each parti- 
tion. Analyses were run under a strict clock (estimated from the data), with the Yule 
model of speciation, for 10 million generations (sampled every 1,000 generations). 
Clock and tree parameters were linked across partitions. Prior distributions on the 
root and 33 other nodes were applied based on an interpretation of the mammalian 
fossil record (see Supplementary Table 3)”°*. The clock rate prior was set as an un- 
informative uniform distribution (upper = 10'°, lower = 107 '”). All other priors 
were left as the default values in BEAUti”. Full details of all prior distributions for 
divergence times are presented in Supplementary Table 3. As in the case of the 
MrBayes analysis, convergence and effective sampling were assessed using Tracer 
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1.7. A maximum clade credibility tree was constructed using TreeAnnotator (avail- 
able with BEAST) from the trees sampled in the posterior distribution. 
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Extended Data Figure 1 | Examples of MALDI-TOF-MS and MS/MS sequence detailed in b for Toxodon (c) and Macrauchenia (d), detailing 
product ion spectra. a, MALDI-TOF-MS ZooMS spectra for Toxodon differences between both genera (gsT and gsA, highlighted) and shared 


(upper) and Macrauchenia (lower) were used to screen for samples for the best _ substitutions compared with Equus (gpA for Equus, gpT for Toxodon and 
collagen preservation. b, PEAKS alignment of matching production spectra for | Macrauchenia). Note in b that both deamidation (ND) and variable 
Macrauchenia MLP 96-V-10-19 (specimen sample number MLP2012.12) hydroxylation (P—h) were detected in different peptides covering this region of 
highlighting peptides aligning to the sequence GPNGEAGSAGPTGPPGLR. the sequence. 

c, d, Annotated PEAKS report of product ion spectra for the same peptide 
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Extended Data Figure 2 | Collagen type I substitution variability for 
placental mammals (genomic and proteomic data) compared with the 
dasyurid marsupial Sarcophilus harrisii (Tasmanian devil) as outgroup. 
Substitution variability scores range between 0 and 1 and incorporate sequence 
coverage for a given number of species over a 15-amino-acid moving average 
(95% standard deviation in lighter tone). Top, along-chain variation in genomic 
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sequence variability (upper red) is similar to proteomic sequence variability 
(lower blue) both for COL1«1 and for COL1«2 chains. Bottom, molecular 
surface rendering (via VMD*) of the collagen unit cell taken from coordinates 
given in Protein Data Bank accession number 3HR2. Colours represent 
genomic (left) and proteomic (right) sequence variability throughout 

the structure. 
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Extended Data Figure 3 | Comparison of levels of deamidation for samples _ dated or were undateable. The measurement approach used in this study— 
in this study with ref. 22 (diamonds). The Macrauchenia sample was ‘*C frequency of deamidation in positions represented in at least seven MS/MS 
dead, consistent with observed levels of deamidation, which are lower than spectra—is different from the approach used in ref. 22, so the absolute values 
either Toxodon dated to 12,000 years ago or Equus sp. (Tapalque; not dated). may not be directly comparable. 

Dotted lines indicate error ranges on Gln estimation for samples that were not 
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Extended Data Figure 5 | Maximum clade credibility phylogeny from 
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BEAST molecular dating analysis. Branch lengths are measured in millions of 
years; scale axis indicates intervals of 100 Ma. Node labels show 95% highest 
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probability densities for molecular dates (in millions of years). Fossil 
constraints are provided in Supplementary Table 3. Vertical dashed line 
indicates Cretaceous/Palaeogene boundary. 
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Extended Data Table 1 | Toxodon and Macrauchenia specimens used in this study 


LETTER 


Effective Burial 


‘ 
Genus Species ae ponies MS/MS code Location Longitude Latitude Province Age as recorded Element ee = ee iar Ta renee ka@ 
Toxodon sp. MLP 121132 Arrecifes “60.1 B44 Buenos Aires Pampean Fm Axis 3 8 
Toxodon sp. MLP 42-1159 Olivera 59.2 34.6 Buenos Aires Pampean Fm Skull cap, juvenile 13 94 
Toxodon sp. MLP 12-2432 no location - - Buenos Aires Pampean Fm Jaw, juvenile 
Toxodon sp. MLP M-218 no location - - Buenos Aires Pampean Fm Mandible 
Toxodon sp. MLP 12-1169 Arrecifes 69.1 34.4 Buenos Aires Pampean Fm Mandible 13 93 
Toxodon sp. MLP 12-1224 Lujan -69.0 34.5 Buenos Aires Pampean Fm Upper incisor 
Toxodon sp. MLP 12-1125 Arrecifes -60.1 34.4 Buenos Aires Pampean Fm 3rd lumbar 13 94 
Toxodon platensis MLP 12-1190 Chelforé -39.0 66.5 Buenos Aires Pampean Fm Turbinals 
Toxodon sp. MLP 12-1227 Lujan -59.0 34.5 Buenos Aires Lujanian Incisor 17 188 
Toxodon sp. MLP 12-1160 Lujan -69.0 34.5 Buenos Aires Pampean Fm Prerygoid 17 188 
Toxodon sp. MLP 9411-117 Rio Quequén Saladc —-60.5 -38.9 Buenos Aires Pleistocene Ulna 10 51 
Toxodon sp. MLP O4-II-1-17 Rio Quequén Saladc  -60.5 -38,9 Buenos Aires Pleistocene Ulna 10 51 
Toxodon sp. MLP 44-xI-29-5  MLP2012.04 Tapalqué -60.0 36.3 Buenos Aires Pleistocene Mandible 11,900 35 (UCIAMS 143034) 13 22 
Toxodon sp. MLP 42-1180 Lujan -69.0 “34.5 Buenos Aires Pampean Fm Maxilla 17 188 
Toxodon sp. MACN Pv 5287 no location - - - Fm Pampeana Metacarpal 
Toxodon sp. MACN Pv 47740 Arroyo Tapalqué -60.0 -36.4 Buenos Aires L Pleistocene Tibia 12,040 70 (UCIAMS 143035) 13 22 
Toxodon sp. MAGN Pv 177140 MACN2012.12 ArroyoTapalqué —-60.0 36.4 Buenos Aires L Pleistocene Tibia 12,040 70 (UCIAMS 143035) 13 22 
Toxodon sp. MACN Pv 5717 Arroyo Tapaiqué -60.0 36.4 Buenos Aires Pampean Fm/ Lujanian Molar 12 72 
Toxodon sp. MACN Pv 2760 no location = = = iS Metapodial 
Toxodon sp. MACN Pv 5712 Arroyo Tapalqué -60.0 -36.4 Buenos Aires Pampeanal Lujanian Tibia 12 72 
Toxodon sp. MACN Pv 9666 Rio Quequen Saladc -60.7 38.4 Buenos Alres L Pampeano Fm Cervical 10 52 
Toxodon platensis ZMK. 16/1887 Arroyo del Medio -60.8 -33.6 Buenos Aires/ Santa Fe Pleistocene Jaw 14 402 
Macrauchenia _patachonica MLP 12-1641 Lujan -69.0 34,5 Buenos Aires Pampean Fm Metapodial 12,185 55 (OxA-25840) 14 28 
Macrauchenia —_patachonica MLP 12-1648 Lujan -69.0 34.5 Buenos Aires Pampean Fm Metapodial 17 188 
Macrauchenia —_patachonica MLP 12-1659 Lujan -69.0 34.5 Buenos Aires Pampean Fm Metapodial 17 188 
Macrauchenia. —_patachonica MLP 12-1488 no location - - Buenos Aires - Phalanx 
Macrauchenia sp. MLP 96-V-10-19 Rio Pilcomayo 57.7 25.2 Formosa Pleistocene Thoracic no date 15 0 
Macrauchenia sp. MLP 96-V-10-19 _MLP2012.12 _ Rio Pilcomayo “87.7 -25.2 Formosa Pleistocene Thoracic no date 15 0 
Macrauchenia sp. MLP 50-X-5-5 Mar del Plata “876 38.1 Buenos Aires - Jaw "1 59 
Macrauchenia sp. MLP 7A-lll-6-1 Rio Salado -61.0 -34.6 Buenos Aires - Tibia 13 89 
Macrauchenia _patachonica MLP 12-1434 Lujan -59.0 34.5 Buenos Aires Pampean Fm Mandible 7 188 
Macrauchenia —_patachonica MLP 12-1458 Lujan -69.0 34.5 Buenos Aires Pampean Fm Mandible 17 188 
Macrauchenia _patachonica MLP 42-2826 no location - - Buenos Aires Pampean Fm Skull, tooth 
Macrauchenia _patachonica MLP 12-1660 Lujan -59.0 34.5 Buenos Aires Pampean Fm Tarsal 7 188 
Macrauchenia sp. MLP 80-IX-5-1 Laguna dela Bombilk  -69.3 444 Chubut Pleistocene Cervical 
Macrauchenia sp. MLP 12-1661 Lujan -59.0 34.5 Buenos Aires Pampean Fm Pedal sesamoid 17 188 
Macrauchenia sp. MLP 12-1660 Lujan -69.0 34.5 Buenos Aires Pampean Fm Phalanx 17 188 
Macrauchenia sp. MLP 12-1487 no location - - - Pampean Fm Jaw 
Macrauchenia _patachonica MLP 80-II-10-2 Rio Quequén Salado -60.7 38.4 Buenos Aires Lujanian Jaw 10 52 
Macrauchenia sp. MACN Pv 6708 Rio Quequén -58.8 -38.2 Buenos Aires - Tooth 1 56 
Macrauchenia sp. MACN Pv 18952 MACN2012.02 iear Monte Hermos — -61.3 -39.0 Buenos Aires Lujanian Cervical no date "1 61 
‘Macrauchenia sp. MACN Pv 7107 Arroyo Seco, Mirama 60.5 33.4 Buenos Aires Pampeana Fm/Lujanian? Pedal sesamoid 15 117 
Macrauchenia sp. MACN Pv 3 Salto -603 34.3 Buenos Aires Pampean Fm Tibia 13 91 
Macrauchenia sp. MACN Pv 2 (05) Salto -60.3 34.3 Buenos Aires Pampean Fm Humerus 13 91 
Macrauchenia sp. MACN Py 2(08) Salto -603 34.3 Buenos Aires Pampean Fm Tibiofibula 13 9 
Macrauchenia sp. MACN Pv 2(07) Salto -603 34.3 Buenos Aires Pampean Fm Femur 13 91 
Macrauchenia _—patachonica. +=» MACN Pv 8708 Rio Quequén (Grande 89.1 34.6 Buenos Aires © PampeanaFm/Lujanian Mandible 13 94 
Macrauchenia sp. MACN Pv 10530 Rio Quequén (Grande -58.7 -38,6 Buenos Aires PampeanaFm/Lujanian __Metapodial 1 56 


Specimens highlighted in bold produced high-quality collagen and were sequenced. Specimens that appear twice were re-sampled. MACN Pv, Museo Argentino de Ciencias Naturales (vertebrate palaeontology 
collection), Buenos Aires, Argentina; MLP, Museo de La Plata; UCIAMS, Keck Carbon Cycle AMS Spectrometer facility, University of California, Irvine, USA; ZMUC, Natural History Museum of Denmark and 
Zoological Museum, Copenhagen, Denmark. ‘Thermal age of samples with a location, but without a radiocarbon date, are calculated at 50,000 years ago. 
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Extended Data Table 2 | Comparative run statistics combining multiple runs 


Species irae Sees bal Platform Enzymatic digestion Runs os ees a “equonce | Fold coverage oe 
runs) coverage (%) 
Equus sp. (Tapalqué) MACN2010.03 Orbitrap Trypsin/P Consensus 31,309 (5) 9,530 (30.4) 87.3 85.9 
Macrauchenia sp. - - - Consensus 78,515 (4) 9,400 (12.0) 89.4 771 
18952 MACN2012.02 Bruker maXis HD Trypsin/P MACN201202 17,334 1,485 (8.6) 63.3 16.6 0.19 
18952 MACN2012.02 Orbitrap Trypsin/P+elastase York14 6,625 769 (11.6) 59.0 96 0 
96-V-10-19 MLP2012.12 Bruker maXis HD Trypsin/P MLP2012.12 34,525 3,410 (9,9) 77.3 35.4 2 
96-V-10-19 MLP2012.12 Orbitrap Trypsin/P+elastase York15 20,031 3,736 (18.7) 88.5 32.8 cn 
Toxodon sp. - - - Consensus 82,448 (4) 12,028 (14.6) 91.0 103.9 
44-XII-29-5 MLP2012.04 Bruker maXis HD Trypsin/P MLP2012.04 20,499 2,720 (13.3) 80.7 28.1 1.29 
44-XII-29-5 MLP2012.04 Orbitrap Trypsin/P+elastase York13 20,706 3,610 (17.4) 84.0 35.8 1.81 
17710 MACN2012.12 Bruker maXis HD Trypsin/P MACN201212 20,134 2,188 (10.9) 76.7 21.0 2.1 
17710 MACN2012.12 Orbitrap Trypsin/P+elastase York12 21,109 3,510 (16.6) 81.7 35.5 0.38 
Mylodon darwinii MLP 94-VIlI-10-32 Orbitrap Trypsin/P Consensus 16,592 (1) 1,230 (7.4) 67.8 14.3 
*Tapirus terrestris - Orbitrap Trypsin/P+elastase Consensus 17,459 (1) 1,111 (6.4) 92.0 9.8 
*Hippopotamus amphibius - Orbitrap Trypsin/P+elastase Consensus 22,450 (1) 3,080 (13.7) 89.6 26.1 
*Orycteropus afer AMNH 51910 Orbitrap Trypsin/P Consensus 20,481 (1) 3,673 (17.9) 93.8 33.1 
*Cyclopes didactylus AMNH 99199 Orbitrap Trypsin/P+elastase Consensus 41,046 (2) 3,230 (7.9) 83.1 26.5 


Taxa with asterisks are modern; others are fossil. Spectra were acquired on two platforms: Orbi-trap for Macrauchenia, Toxodon, Tapirus, Hippopotamus, Orycteropus, Mylodon, and Cyclopes; and maxis HD for 
Tapalqué Equus, Macrauchenia, and Toxodon. Individual samples were digested using either trypsin (Tapalqué Equus, Macrauchenia, Toxodon, Orycteropus, and Mylodon) or trypsin pooled with elastase digests 
(Macrauchenia, Toxodon, Tapirus, Hippopotamus, and Cyclopes). 
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Anomalocaridid trunk limb homology revealed by a 
giant filter-feeder with paired flaps 


Peter Van Roy’”, Allison C. Daley** & Derek E. G. Briggs! 


Exceptionally preserved fossils from the Palaeozoic era provide cru- 
cial insights into arthropod evolution, with recent discoveries bring- 
ing phylogeny and character homology into sharp focus’ “. Integral 
to such studies are anomalocaridids, a clade of stem arthropods whose 
remarkable morphology illuminates early arthropod relationships*® 
and Cambrian ecology’. Although recent work has focused on the 
anomalocaridid head*"°, the nature of their trunk has been debated 
widely”"''*, Here we describe new anomalocaridid”’ specimens from 
the Early Ordovician Fezouata Biota of Morocco”, which not only 
show well-preserved head appendages providing key ecological data, 
but also elucidate the nature of anomalocaridid trunk flaps, resolv- 
ing their homology with arthropod trunk limbs. The new material 
shows that each trunk segment bears a separate dorsal and ventral 
pair of flaps, with a series of setal blades attached at the base of the 
dorsal flaps. Comparisons with other stem lineage arthropods'°*”” 
indicate that anomalocaridid ventral flaps are homologous with 
lobopodous walking limbs and the endopod of the euarthropod 
biramous limb, whereas the dorsal flaps and associated setal blades 
are homologous with the flaps of gilled lobopodians (for example, 
Kerygmachela kierkegaardi, Pambdelurion whittingtoni) and exites 
of the ‘Cambrian biramous limb”’. This evidence shows that anom- 
alocaridids represent a stage before the fusion of exite and endopod 
into the ‘Cambrian biramous limb’”’®”’, confirming their basal 
placement in the euarthropod stem‘, rather than in the arthropod 
crown” or with cycloneuralian worms”. Unlike other anomalocar- 
idids, the Fezouata taxon combines head appendages convergently” 
adapted for filter-feeding with an unprecedented body length exceed- 
ing 2 m, indicating a new direction in the feeding ecology of the clade. 
The evolution of giant filter-feeding anomalocaridids may reflect 
the establishment of highly developed planktic ecosystems during 
the Great Ordovician Biodiversification Event”. 


Phylum Arthropoda von Siebold, 1848 
Order Radiodonta Collins, 1996 
Family Hurdiidae Vinther, Stein, Longrich & Harper, 2014 
Aegirocassis benmoulae gen. et sp. nov. 


Life Science Identifier (LSID). urn:lsid:zoobank.org:act:35C7BB1E- 
C902-4F7B-9A4B-899005D7B6AE 

Etymology. £gir: a giant in Norse mythology and god of the sea; cassis 
(Latin, helmet): referring to the huge size and elaborate cephalic shield; 
and in recognition of Mohamed ‘Ou Said’ Ben Moula, who discovered 
the Fezouata Biota and the specimens described here. Gender feminine. 
Holotype. Yale Peabody Museum of Natural History specimen YPM 
237172 (Fig. 1, Extended Data Figs 1, 2 and Supplementary Video). 
Other material. Paratypes: YPM 227556 (Extended Data Fig. 3c, d), 
YPM 525437 (Extended Data Fig. 4 and Supplementary Video), YPM 
527123 (Extended Data Fig. 5a—c), YPM 527125 (Fig. 2a-b and Ex- 
tended Data Fig. 6a), YPM 226437, YPM 522227 (Fig. 2c, Extended Data 
Fig. 7a—c). Other notable specimens: YPM 226438, YPM 226439, YPM 
523423-523425 (Extended Data Figs 3e-h, 5fand 7d, e), YPM 523427 


(Extended Data Fig. 7d, e), YPM 523428 (Extended Data Fig. 5g), YPM 
516785 (Extended Data Fig. 3a, b), YPM 523810 (Extended Data Fig. 5e), 
YPM 525217 (Extended Data Fig. 6b-d), YPM 516791 (Extended Data 
Fig. 8a—c), YPM 227934 (Extended Data Fig. 8e), YPM 516792 (Extended 
Data Fig. 8f) and YPM 527124 (Extended Data Fig. 5d), and setal blades 
associated with YPM 527123 (Extended Data Fig. 8d). Fragmentary 
material of three other articulated individuals, four slabs with disarti- 
culated material belonging to at least 10 individuals, 15 isolated cara- 
pace elements, 14 sets of partial ventral spines and 11 isolated bands of 
setal blades. 

Locality and horizon. Lower Fezouata Formation, latest Tremadocian, 
Araneograptus murrayi Biozone. All three-dimensional specimens were 
collected from two sites on the eastern flank of Jbel Tigzigzaouine, facing 
Oued Ezegzaou. Specimens of carapaces, setal blades and ventral spines 
of the frontal appendages occur at numerous sites throughout the Lower 
Fezouata Formation to the north of Zagora, often in abundance. Detailed 
locality information is curated with the specimens. 

Diagnosis for genus and species. Anomalocaridid with tripartite frontal 
carapace having a central element at least as long as trunk, with an axial 
carina, pointed tip, rounded posterior margin and narrow downturned 
postero-ventral triangular extensions tapering towards rear and over- 
lapping the lateral carapace elements dorsally. Lateral carapace elements 
oval, with rounded antero-dorsal expansion and longitudinal carina 
just below midline. Multisegmented anterior appendages consisting of 
seven podomeres. First podomere longest, with one shorter, comb-like 
ventral spine proximally. Succeeding five podomeres each with a single 
elongate, inward-angled ventral spine with stout setae bearing a double 
row of fine spinules set in a ‘V’ on their dorsal margin. Terminal podo- 
mere stout, with pointed tip. Flat, broad trunk of 11 segments attaining 
maximum width at third segment and tapering to a blunt tip. Two pairs 
of non-overlapping flaps per segment: dorsal flaps pointed with recurv- 
ing anterior and posterior margins, width about 1X the length of their 
attachment; ventral flaps narrow, triangular, width about 1.5 the length 
of their attachment. Continuous band of dorsal setal blades attached to 
base of each pair of dorsal flaps, traversing the trunk. 

A detailed description and interpretation of the material, including 
the filter-feeding frontal appendages, is provided in the Supplemen- 
tary Text. The holotype YPM 237172 is an almost complete three- 
dimensionally preserved individual in slightly oblique dorsal view (Fig. 1, 
Extended Data Fig. 1 and Supplementary Video). The concretion has 
split such that a small block reveals both dorsal and ventral flaps on the 
anterior left of the specimen (Extended Data Fig. 2 and Supplementary 
Video). The tripartite carapace (Extended Data Figs 3, 4 and Supplemen- 
tary Video) extends well in front of the head; the largest isolated carapace 
element exceeds 1 m in length, indicating individuals more than 2 min 
overall length. The frontal appendages are composed of seven podo- 
meres (Fig. 2a, b and Extended Data Fig. 5a—c). The long proximal podo- 
mere bears a short, backwardly directed ventral spine with a comb-like 
array of spines on its posterior margin (Figs 2a, b and Extended Data 
Fig. 5a-c). The five succeeding short podomeres bear long, ventral spines 


Department of Geology and Geophysics, Yale University, PO Box 208109, New Haven, Connecticut 06520, USA. *Research Unit Palaeontology, Department of Geology and Soil Science, Ghent University, 
Krijgslaan 281/S8, B-9000 Ghent, Belgium. #Department of Zoology, University of Oxford, The Tinbergen Building, South Parks Road, Oxford OX1 3PS, UK. “Oxford University Museum of Natural History, 
Parks Road, Oxford OX1 3PW, UK. °Yale Peabody Museum of Natural History, Yale University, New Haven, Connecticut 06520, USA. 
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Figure 1 | A. benmoulae, holotype YPM 237172, Early Ordovician, 
Fezouata Biota, Morocco. a-d, Dorsal view: a, part, showing ventral flaps; 
b, part, separate block in place, showing dorsal flaps; c, interpretative drawing 
combining part and counterpart; d, part, matrix surrounding dorsal flaps 
digitally removed to show both sets of flaps. e-g, Lateral view: e, part; f, part, 


curving forward distally (Fig. 2c and Extended Data Figs 5d-g, 6 and 7). 
These ventral spines were canted inward at about 45° to the longitudinal 
axis of the appendage (Extended Data Fig. 5a—c). They carry approxi- 
mately 80 long, mobile, laterally flattened, flexible setae on their anterior 
margin. These setae bear two rows of densely spaced fine spinules set 
in a ‘V’ on their dorsal margin (Fig. 2 and Extended Data Fig. 6). The 
terminal pointed podomere of the appendage lacks spines (Fig. 2a, b 
and Extended Data Fig. 5a—c). No eyes or oral cone have yet been found. 
In the trunk, dorsal and ventral flaps are non-overlapping and sepa- 
rated from each other by intervening body wall (Fig. 1, Extended Data 
Figs 1 and2 and Supplementary Video). Both have densely spaced trans- 
verse rods composed of short, flared, hollow cones one inserted into 
another; the basal cone is substantially larger than those succeeding it 
(Fig. 1h). The holotype YPM 237172 shows that segmentally arranged 
bands of thin, flexible setal blades attach at the base of the dorsal flaps 
and traverse the animal dorsally (Fig. 1, Extended Data Fig. 1). Indi- 
vidual setal blades connect to each other a short distance behind their 
anterior margin (Extended Data Fig. 8e). The blades have rounded ter- 
minations and show the presence of fine lamellae, probably on both 
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dorsal flaps added from counterpart; g, interpretative drawing combining part 
and counterpart. h, Part, transverse rods composed of hollow cones of third 
ventral flap. i, Counterpart, oblique view of anterior free end of setal blades 
showing lamellae laterally. Arabic numerals indicate trunk somites. 


sides (Fig. li and Extended Data Fig. 8). There is no evidence for the 
presence of a tail fan. A reconstruction of A. benmoulae is provided in 
Fig. 3. 

The discovery of dorsal flaps in A. benmoulae warranted re-examination 
of Cambrian anomalocaridids, given that the presence of dorsal flaps is 
difficult to demonstrate in flattened specimens owing to compaction 
and the tendency of the shale to split along one plane. Specimens of 
Peytoia nathorsti from the Burgess Shale revealed clear evidence of their 
presence in National Museum of Natural History specimen USNM 
274161 (Extended Data Fig. 9a—c), and possibly USNM 274145 (Extended 
Data Fig. 9e). There are also indications of two sets of flaps in Hurdia 
(Royal Ontario Museum specimens ROM 49930 and ROM 59320) but 
in this case the evidence is more circumstantial (see Supplementary Text). 

Given their phylogenetic position immediately stemward of euar- 
thropods** (Fig. 4, Extended Data Fig. 10 and Supplementary Text), the 
apparent absence of biramous appendages has been an anomalous aspect 
of anomalocaridid morphology. It was usually assumed that anomalo- 
caridid lateral flaps were homologous to the flaps of gilled lobopodians 
such as Kerygmachela kierkegaardi and that ventral limbs were lost'*"°. 
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Figure 2 | A. benmoulae, appendages and ventral spines, Early Ordovician, 
Fezouata Biota, Morocco. a, Complete frontal appendage with partial ventral 
spines, showing mobile spinulose filtrating setae, paratype YPM 527125. 
Detail of the spinulose filtrating setae is provided in Extended Data Fig. 6a. 
The previously known flaps in Cambrian anomalocaridids®'*'*'>"8, 
however, overlap from posterior to anterior, the reverse of the arrange- 
ment in the more basal K. kierkegaardi, Pambdelurion whittingtoni and 
Opabinia regalis'**° **”°, This anomaly is resolved by the discovery of 
additional, dorsal flaps in A. benmoulae, P. nathorsti and probably also 
Hurdia victoria: the position and morphology of the dorsal flaps indi- 
cates that they are homologous with those in gilled lobopodians. Thus, 
the larger ventral flaps in A. benmoulae and Cambrian anomalocar- 
idids are here considered to be homologous with the lobopodous limbs 
of K. kierkegaardi and P. whittingtoni. This interpretation is supported 
by the presence of limbs in the anomalocaridid Cucumericrus decoratus™, 
which shows lobopodous walking limbs overlain dorsally by a single set 
of flaps (see Supplementary Text). The setal blades in A. benmoulae and 
other anomalocaridids, which are attached to the dorsal flaps and over- 
lie the trunk, are probably homologous with the less extensive ‘gill-like’ 
wrinkled structures on the flaps of K. kierkegaardi and P. whittingtoni*”’, 
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b, Interpretative drawing of YPM 527125. c, Partial appendage with 
complete ventral spines, paratype YPM 522227. Roman numerals indicate 
appendage podomeres. 


and the setal blades in O. regalis*’*** (Fig. 4), although an alternative 
interpretation for these last structures has been advanced”. 

It has been suggested that the Cambrian biramous limb arose through 
the sclerotization of the lobopodous walking limb and its fusion with 
the dorsal flap of gilled lobopodians, which was reduced to leave the 
gill as an exite’’®*. The presence of a dorsal gill-bearing flap inserting 
separately to the ventral limb-derived flap in A. benmoulae and other 
anomalocaridid taxa indicates that they pre-date the acquisition of 
biramous limbs. This confirms their place on the euarthropod stem 
(Fig. 4, Extended Data Fig. 10 and Supplementary Text), resolving the 
debate on their phylogenetic position in line with recent neurological 
evidence’. 

Among arthropods, the size of A. benmoulae (over 2 m in length) 
is paralleled only by some pterygotid eurypterids” and terrestrial 
arthropleurids”*. The evolution of gigantic filter-feeders within clades 
of nektic macrophagous predators is well documented in Mesozoic 


Figure 3 | A. benmoulae, reconstruction, Early Ordovician, Fezouata Biota, 
Morocco. Eye shape and position inferred from related taxa, with position 
further supported by the posterior gape between the carapace elements. The 
eyes are deliberately depicted comparatively smaller than in other 
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anomalocaridids: to achieve visual acuity comparable to that of more 
diminutive forms, a large animal requires smaller eyes relative to its body size. 
In addition, a filter-feeding lifestyle demands less acute vision than a 
macropredatory mode of life, further reducing the need for large eyes. 
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Figure 4 | Simplified cladogram showing the position of A. benmoulae, 
and schematic cross-sections through the bodies of included taxa 
illustrating the limb homologies and morphological transitions. The 
position of setal blades in C. decoratus is uncertain. A more extensive cladogram 
is provided in Extended Data Fig. 10. Light blue, ventral limbs/endopods; 
light orange, dorsal flaps; dark orange, setal blades/exites. 


pachycormid fish”? and Cenozoic sharks and whales”. The huge size of 
A. benmoulae represents a much earlier example of a filter-feeding life- 
style correlating to gigantism. The abundance of gigantic anomalocar- 
idid filter-feeders in the high palaeolatitude Fezouata Biota points to a 
complex planktic ecosystem. Early Cambrian anomalocaridid filter- 
feeders also fed on zooplankton, but they remained relatively small’. 
Although the Cambrian Explosion saw the establishment of the first 
complex planktic ecosystems, the convergent (Supplementary Text) rise 
of giant filter-feeding anomalocaridids during the Ordovician followed 
an increase in the abundance and diversity of phytoplankton and a 
consequent zooplankton radiation as part of the Great Ordovician 
Biodiversification Event”. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

The Fezouata specimens are housed in the collections of the Yale Peabody Museum 
of Natural History (YPM), while the Peytoia material examined is at the National 
Museum of Natural History (USNM) and the Hurdia fossils used for this paper are 
in the collections of the Royal Ontario Museum (ROM). 

The Moroccan specimens were mechanically prepared using PaleoTools ME9100, 
PaleoAro, MicroJack5 and MicroJack] air scribes, and needles and scalpels. Speci- 
mens were glued with Paraloid B-72 dissolved in acetone, after which they received 
a protective coat of consolidant, consisting of a 5% solution of Butvar B-98 in 
ethanol. 

For photography, the Moroccan specimens were illuminated by a 500 W tungsten 
floodlight with an Aflash Photonics linear polarizer in front; a Cokin XPro X164 
circular polarizer was mounted on the camera lens and crossed with the polarizer 
of the light source to maximize contrast. All parts were lit from the northwest. 
With the exception of the flaps, counterparts were illuminated from the southwest 
and mirrored in Adobe Photoshop CC 2014 to create a false-positive relief image 
and facilitate direct comparison of part and counterpart. In some cases, where indi- 
cated, information from part and counterpart was combined digitally into a single 
image in Adobe Photoshop CC 2014 to facilitate interpretation. All specimens 
were photographed dry, with the exception of YPM 227934, which was imaged 
under ethanol. 

The micrograph of the muscle tissue in Extended Data Fig. 2g was taken with a 
Leica DFC 425 digital camera attached to a Leica MZ16 binocular microscope with 
a Leica Plan APO 1 lens and steered from a computer through Leica Application 
Suite 4.2. All other photographs were taken with a Hasselblad H4D-200MS med- 
ium frame digital single-lens reflex camera attached to a computer and operated 
remotely in six-shot mode through Hasselblad Phocus 8.2.1 software to acquire 
images of 200 megapixel resolution. Overview photographs of YPM 237172 used a 
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Hasselblad HC 2.8/80 mm lens stopped down to f/8; close-ups and all other, smaller 
specimens were photographed with a Hasselblad HC Macro 4/120 mm II lens 
stopped down to f/9.5. Lens distortion was corrected using Hasselblad Phocus 
8.2.1 software. Stacks of between 10 and 50 images were taken in aperture priority 
mode, with manual focusing through the focal plane. After exporting the FFF-format 
digital negatives to TIFF from Hasselblad Phocus 8.2.1, the photographs were 
stacked in Zerene Stacker 1.04 (64 bit) using the PMax pyramid stack algorithm. 
The stacked images were then post-processed in Adobe Photoshop CC 2014, first 
applying the ‘Sharpen more’ and ‘Sharpen’ functions, followed by removal of the 
background. Levels were then manually balanced while holding down the ‘alt’ key 
to prevent clipping of pixels in the specimen; the grey level was always retained at 
50%. Ina few cases, some minor adjustments were made to the exposure. The high- 
resolution images were down-sampled in Adobe Photoshop CC 2014 to lower- 
resolution TIFF files for use in the plates. 

The Burgess Shale specimens were imaged immersed in water, with polarized 
lighting sourced from the northwest; a second polarizer in front of the camera lens 
was crossed with the polarization of the light source to enhance contrast. Photo- 
graphs were taken using a Canon EOS 500D small-frame digital single-lens reflex 
camera controlled remotely using the EOS Utility 2.8.1.0 program. The camera was 
fitted with a Canon EF-S 60 mm Macro Lens, which was stopped down to £/2.8 
(Extended Data Fig. 9d), f/3.5 (Extended Data Fig. 9a), f/4.0 (Extended Data Fig. 9b) 
or f/4.5 (Extended Data Fig. 9e). Images were post-processed in Adobe Photoshop 
CS6 using the ‘Sharpen’ function, minor adjustments were made to the exposure 
and the background was removed where necessary. Extended Data Fig. 9 was created 
using Adobe Illustrator CS6. 

Explanatory drawings of the specimens were prepared in Adobe Illustrator CS6 
on the basis of the high-resolution images. Photographs of part and counterpart were 
used to create composite drawings. The drawings were consistently colour-coded 
to allow identification of anatomical structures. 
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Extended Data Figure 1 | A. benmoulae, nearly complete three- showing dorsal flaps. c, With two blocks removed, showing dorsal flaps alone. 
dimensionally preserved specimen, counterpart, dorsal view, Early d, Digital combination of images, showing both dorsal and ventral flaps. 
Ordovician, Fezouata Biota, Morocco, holotype YPM 237172. a, With e, Interpretative drawing of dorsal view combining information from part and 
separate blocks in place, showing ventral flaps. b, With one block removed, counterpart. Arabic numerals indicate trunk somites. 
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Extended Data Figure 2 | A. benmoulae, nearly complete three- c, Separate block, counterpart, ventral flaps. d, Counterpart, dorsal flaps. 
dimensionally preserved specimen, dorsal and ventral flaps, Early e, Interpretative drawing of lateral view of separate block. f, part, ventral flaps. 
Ordovician, Fezouata Biota, Morocco, holotype YPM 237172. a, Separate g, Part, muscle tissue closely associated with first dorsal flap on left side, 
block, part, dorsal flaps, plan view. b, Separate block, lateral view showing showing individual fibres. 


body wall (counterpart), and dorsal (part) and ventral flaps (counterpart). 
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Extended Data Figure 3 | A. benmoulae, central elements of carapace, Early 
Ordovician, Fezouata Biota, Morocco. a, b, YPM 516785: a, nearly complete 
central element, part, dorsal view; b, interpretative drawing. c, d, Paratype 
YPM 227556: c, nearly complete central element, part, dorsal view; 
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d, interpretative drawing. e, f, YPM 523425: e, ventral triangular extension, 
counterpart, showing marginal rim and texture; f, interpretative drawing. 

g, h, YPM 523424: g, partial central element, part, oblique, showing second 
morph with additional anterior triangular extension; h, interpretative drawing. 
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imprint of triangular ventral extension. c, With dorsal side of central element 
digitally removed, revealing triangular ventral extension overlying anterior of 
lateral element. d, Interpretative drawing. 


Extended Data Figure 4 | A. benmoulae, complete carapace lateral element 
associated with partial central element, Early Ordovician, Fezouata Biota, 
Morocco, paratype YPM 525437. a, With partial central element, part, in 
place. b, With partial central element, part, removed, revealing counterpart 
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Extended Data Figure 5 | A. benmoulae, appendages and ventral spines, 527124 belong to a disarticulated assemblage which may represent a single 
Early Ordovician, Fezouata Biota, Morocco. a-c, Paratype YPM 527123, individual. e, YPM 523810, part, distal portion of five ventral spines. f, YPM 
nearly complete appendage: a, part; b, interpretative drawing combining part 523423 and 523424, counterpart, ventral spines and partial carapace element. 
and counterpart; c, counterpart. d, YPM 527124, part, distal portion of ventral g, YPM 523428, part, termination of ventral spine. Roman numerals 

spines. Setae showing double row of spinules arrowed. YPM 527123 and indicate appendage podomeres. 
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Extended Data Figure 6 | A. benmoulae, appendages and appendage ventral _spinules arrowed. b-d, YPM 525217, partial appendage: b, part; 

spines, Early Ordovician, Fezouata Biota, Morocco. a, Close-up of ventral _¢, interpretative drawing combining information from part and counterpart; 
spines of YPM 527125, showing spinulose filtrating setae and their insertion  d, counterpart. Roman numerals indicate appendage podomeres. 

on the anterior margin of the ventral spines. Setae showing double row of 
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Extended Data Figure 7 | A. benmoulae, Early Ordovician, Fezouata c, counterpart. Roman numerals indicate appendage podomeres. 
Biota, Morocco. a-c, Partial appendage, paratype YPM 522227: a, part; d, e, Assemblage of carapace elements, appendage ventral spines and setal 


b, interpretative drawing combining information from part and counterpart; —_ blades, YPM 523423-523427: d, specimen; e, interpretative drawing. 
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Extended Data Figure 8 | A. benmoulae, isolated bands of setal blades, lamellae on setal blades. e, YPM 227934, part, showing connection between 
Early Ordovician, Fezouata Biota, Morocco. a-c, YPM 516791: a, part; setal blades and division into short anterior and long posterior free parts. 
b, counterpart; c, close-up of counterpart, showing fine lateral lamellae on setal f, YPM 516792, part. 

blades in plan view. d, Specimen associated with YPM 527123, part, showing 
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Extended Data Figure 9 | P. nathorsti, articulated specimens showing counterpart, showing two sets of flaps; c, interpretative drawing. d, USNM 
dorsal flaps, middle Cambrian, Burgess Shale, Canada. a, USNM 274156and 274154, the opposite half of the split corresponding to USNM 274156 and 
USNM 274161 joined into complete specimen. White box indicates area of 274161. e, USNM 274145. Blue arrows indicate ventral flaps; orange arrows 
close-ups of USNM 274161 in b and c. b, c, USNM 274161: b, posterior, indicate dorsal flaps. 
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Extended Data Figure 10 | Results of the phylogenetic analysis. Strict 

consensus of 70 most parsimonious trees obtained under equal weighting 
(consistency index = 0.611; retention index = 0.798). Numbers above nodes 
indicate Bremer support/standard bootstrap (1,000 replicates) values; 
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number below nodes is the jackknife (1,000 replicates, P = 36) value. An 
identical strict consensus tree is obtained with implied weighting for all k values 
from 3 to 8. 
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elF3 targets cell-proliferation messenger RNAs for 
translational activation or repression 


Amy S. Y. Lee’, Philip J. Kranzusch'? & Jamie H. D. Cate!**° 


Regulation of protein synthesis is fundamental for all aspects of eu- 
karyotic biology by controlling development, homeostasis and stress 
responses’”. The 13-subunit, 800-kilodalton eukaryotic initiation factor 
3 (eIF3) organizes initiation factor and ribosome interactions required 
for productive translation’. However, current understanding of elF3 
function does not explain genetic evidence correlating eIF3 dereg- 
ulation with tissue-specific cancers and developmental defects*. Here 
we report the genome-wide discovery of human transcripts that interact 
with eIF3 using photoactivatable ribonucleoside-enhanced cross- 
linking and immunoprecipitation (PAR-CLIP)’. eIF3 binds to a highly 
specific program of messenger RNAs involved in cell growth control 
processes, including cell cycling, differentiation and apoptosis, via 
the mRNA 5’ untranslated region. Surprisingly, functional analysis 
of the interaction between eIF3 and two mRNAs encoding the cell 
proliferation regulators c-JUN and BTGI reveals that eIF3 uses 
different modes of RNA stem-loop binding to exert either trans- 
lational activation or repression. Our findings illuminate a new role 
for eIF3 in governing a specialized repertoire of gene expression and 
suggest that binding of eIF3 to specific mRNAs could be targeted to 
control carcinogenesis. 

Extensive genetic evidence implicates eIF3 in other functions in trans- 
lation outside of its general role as a protein scaffold for the formation of 
initiation complexes. Mutation or inactivation of eIF3 subunits results 
in developmental defects in Caenorhabditis elegans and zebrafish®’. 
Furthermore, analyses of human tumours reveal that overexpression of 
elF3 is linked to diverse cancers, including breast, prostate and oesopha- 
geal malignancies**. The integral role of eIF3 during cellular differenti- 
ation, growth and carcinogenesis suggests that eIF3 might drive specialized 
translation. Consistent with this hypothesis, translation of hepatitis C 
virus RNA occurs through essential interactions between eIF3 and a 
structured internal ribosome entry site (IRES) element in the viral 
genome, indicating the feasibility of translation regulation being driven 
by distinct cellular eIF3-mRNA contacts’. 

To identify candidate transcripts regulated through direct interactions 
with eIF3, we first used a genome-wide approach to determine the eIF3 
RNA-binding targets in human 293T cells. Because eIF3 is composed 
of 13 subunits (eIF3a—m), we adapted a 4-thiouridine PAR-CLIP° approach 
to allow analysis of a large multimeric complex, with isolation of indi- 
vidual subunit-RNA libraries (Fig. 1a). As overexpression of single eIF3 
subunits can alter complex assembly’, we optimized immunoprecipi- 
tation of the full endogenous eIF3 complex using an antibody that 
recognizes the e[F3b subunit (Fig. 1b). High-salt washes were used to 
ensure removal of potentially contaminating translation factors, such 
as eIF4G or the small ribosomal subunit (Fig. 1c). After RNase diges- 
tion, separation of crosslinked e[F3-RNA complexes by denaturing gel 
electrophoresis demonstrated that four of the thirteen subunits cross- 
link directly to RNA (Fig. 1d), identified by mass spectrometry as elF3a, 
b, d and g (Extended Data Fig. 1). 

For each subunit, separate complementary DNA libraries were gen- 
erated from the isolated crosslinked RNAs and deep sequenced using 


Illumina technology. Sequenced reads from three biological replicates were 
mapped to the genome and grouped into eIF3-binding sites by using the 
cluster-finding tool PARalyzer’®. Read clusters were found in 479 unique 
genes, with eIF3a, b, dand g crosslinking to 328, 264, 356 and 352 tran- 
scripts, respectively (Supplementary Tables 1 and 2). The limited num- 
ber of interacting genes supports capture of specific eIF3-RNA contacts, 
as these targets compromise only ~3% of total expressed transcripts 
(Extended Data Fig. 2). As a further control, we do not see crosslinking 
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Figure 1 | PAR-CLIP of the multi-protein translation initiation factor 
complex eIF3. a, Schematic of PAR-CLIP methodology. 4-Thiouridine- 
labelled (s*U) RNAs were crosslinked to proteins and endogenous eIF3 
complexes were immunoprecipitated using an antibody that recognizes eIF3b. 
Separate cDNA libraries were constructed for individual crosslinked subunits. 
b, Immunoprecipitation (IP) of the eIF3 complex. Magnetic beads without 
elF3b antibody were used as a negative control. c, Western blot of 
immunoprecipitated complexes after PAR-CLIP. d, Phosphorimage of SDS gel 
resolving 5’ **P-labelled RNAs crosslinked to eIF3 subunits. Crosslinked RNAs 
cause the subunits to migrate ~10 kDa above their expected size”’. 
Immunoprecipitated samples prepared from 4-thiouridine-labelled 293T cell 
lysates treated without ultraviolet (UV) 365 nm light are shown as a negative 
control. Coomassie blue staining of purified native eIF3 resolved by SDS- 
polyacrylamide gel electrophoresis (SDS-PAGE) is shown for size reference. 
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Figure 2 | Analysis and validation of eIF3 PAR-CLIP-derived binding sites. 
a, Length distribution of PAR-CLIP clusters. nt, nucleotides. b, Distribution of 
number of PAR-CLIP clusters per gene. c, Distribution of PAR-CLIP targets 
among different combinations of eIF3 subunit crosslinking. d, Validation of 
PAR-CLIP targets by eIF3 immunoprecipitation and RT-PCR. eIF3 
immunoprecipitation (IP) was performed using an anti-eIF3b antibody as in 
Fig. 1b. As negative controls, the immunoprecipitation was performed with 
isotype-matched immunoglobulin G (IgG) or anti-haemagglutinin tag (HA) 
antibody. e, Distribution of eIF3 crosslinking sites along mRNAs and in other 
classes of RNAs. CDS, coding sequence; Misc., miscellaneous. 


to highly abundant ribosomal RNAs, in agreement with biochemical 
and structural studies showing that eIF3 interacts primarily with the protein- 
rich face of the small ribosomal subunit'*™*. 

The majority of RNAs contained a single eIF3-binding site, with a 
median cluster length of 25 nucleotides (Fig. 2a, b). These RNAs interact 


with distinct combinations of eIF3a, b, d and g subunits (Fig. 2c). To 
validate the RNAs identified by PAR-CLIP, we performed eIF3 immu- 
noprecipitation in the absence of crosslinking. We detected eIF3-RNA 
interactions for five top candidate genes using polymerase chain reac- 
tion with reverse transcription (RT-PCR); whereas a negative control 
mRNA, the PSMB6 transcript, was not immunoprecipitated (Fig. 2d). 

In eukaryotic protein synthesis, the 5’ UTR of mRNA is thought to 
be the major site of translation regulation’. In agreement with identifying 
translation regulation roles of specific eIF3-mRNA interactions, the 
elF3-binding sites predominantly mapped to the 5’ UTR (~70%) (Fig. 2e). 
To examine the impact of transcript-specific engagement of eIF3 on 
translational control, we focused on two genes with an eIF3-binding site 
in the 5’ UTR, c-JUN and B-cell translocation gene 1 (BTG1) (Fig. 3a, 
b). c-JUN is a member of the immediate early response transcription 
factor AP1 and a positive mitotic regulator’’. In contrast, BTG1 acts as 
a negative regulator of proliferation and its expression induces cellular 
differentiation’®'’. Because of the opposing effects of c-JUN and BTG1 
on cellular growth, we wanted to understand why eIF3 would interact 
with both mRNAs. We constructed luciferase reporters containing the 
5' UTR of c-JUN or BTG1 with or without the eIF3 crosslinking site 
identified by PAR-CLIP (Fig. 3c). Deletion of the crosslinking site from 
the 5’ UTR of c-JUN abolished translation of mRNAs transfected into 
cells, indicating that eIF3 binding is required for efficient translation 
(Fig. 3d). In stark contrast, BTG1 translation was highly upregulated 
when the eIF3-binding site was removed from the mRNA (Fig. 3e). 
Furthermore, treatment of 293T cell in vitro translation extracts with 
m’G cap analogue inhibited translation of both c-JUN and BTG]1 luci- 
ferase reporter mRNAs, demonstrating that elF3-dependent translation 
regulation of these transcripts is cap-dependent and thus distinct from 
viral IRES-like mechanisms’® (Fig. 3f, g). These results demonstrate 
that eIF3 can act as both a translation activator and repressor of specific 
cellular mRNAs. 

To understand how elF3 binding to mRNA leads to opposing trans- 
lation phenotypes, we next identified the full RNA elements for eIF3 
recognition in the c-JUN and BTGI mRNAs. While PAR-CLIP marks 
the localized vicinity of eIF3 in the 5’ UTR, elF3 interaction could occur 
either through recognition ofa linear sequence or in the context of RNA 
secondary structure. Using selective 2'-hydroxyl acylation analysed by 
primer extension (SHAPE), we experimentally determined the second- 
ary structure around the eIF3-binding sites (Fig. 4a, d). For both c-JUN 
and BTG1, SHAPE revealed that the eIF3-binding sites map to structured 
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Figure 3 | eIF3 is a positive and negative transcript-specific translational 
regulator. a, b, e[F3 PAR-CLIP cluster in the 5’ UTR of c-JUN mRNA (a) or 
BTG1 mRNA (b). Reads mapped are shown along the respective genes. 

c, Schematic of c-JUN and BTG1 5’ UTR-luciferase reporter mRNAs. The eIF3 
PAR-CLIP cluster is nucleotide positions 181-214 for the c-JUN transcript 
(GenBank accession NM_002228) and positions 105-187 for the BTG1 
transcript (GenBank accession NM_001731). WT, wild type. d, e, Luciferase 
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activity in cells transfected with mRNAs containing the c-JUN (d) or 

BTGI (e) 5’ UTR with or without a deletion of the eIF3 crosslinking site. 

f, g, Luciferase activity in vitro from mRNAs driven by the c-JUN (f) or BTG1 
(g) 5’ UTR, with or without competitor m’G cap analogue. The results of 
d-g are given as the mean ~ standard deviation (s.d.) of three independent 
experiments, each performed in triplicate. 
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Figure 4 | Opposing translation phenotypes are driven by different modes 
of eIF3-mRNA binding. a, SHAPE-based secondary structure of the c-JUN 
5’ UTR surrounding the eIF3 PAR-CLIP site. Nucleotides are colour-coded by 
their SHAPE reactivities, with higher reactivity reflecting single-stranded 
behaviour and non-reactivity indicating base pairing between nucleotides. 

b, Representative native gel shifts showing a specific and binary interaction 
between recombinant eIF3 and the wild-type (WT) c-JUN stem-loop (SL) 
structure but not the mutated stem-loop. c, Luciferase (Luc) activity in vitro 
of mRNAs driven by the c-JUN 5’ UTR containing stem-loop mutations. Mut 
SL, mutant stem-loop. d, SHAPE-based secondary structure of the BTG1 


RNA regions corresponding to conserved stem-loops (Extended Data 
Fig. 3). For the c-JUN mRNA element, we investigated the importance 
of secondary structure in eIF3 recognition by mutating base-pairing 
interactions of five nucleotides in the stem while leaving the cross- 
linking site intact (Fig. 4b). eIF3 directly bound to the c-JUN stem- 
loop, but not the mutated stem-loop, as determined by native agarose 
gel electrophoresis with radiolabelled RNA and recombinant or native 
elF3 (Fig. 4b and Extended Data Fig. 4). Furthermore, the same muta- 
tions in the c-JUN luciferase reporter mRNA led to the identical trans- 
lation phenotype as deletion of the full e[F3 crosslinking site (Fig. 4c 
and Extended Data Fig. 5a). Unlike its interactions with the c-JUN RNA, 
eIF3 was unable to bind to the BTG1 stem-loop in a binary fashion 
(Extended Data Fig. 4b). As eI[F3 immunoprecipitates BTG1 mRNA in 
cell lysates (Fig. 2d), this suggests that other currently unknown factors 
are required for this mode of eIF3-RNA interaction. To verify that the 
BTGI1 stem-loop is sufficient for the inhibitory translation phenotype 
of eIF3 binding, we asked whether addition of the stem-loop could 
block translation driven by the PSMB6 5' UTR, which does not inter- 
act with eIF3 (Fig. 2d). Transplantation of the BTG1 stem-loop 
into the PSMB6 5’ UTR conferred translation inhibition (Fig. 4e 
and Extended Data Fig. 5b). Importantly, addition of the transversed 
BTGI1 stem-loop sequence does not alter PSMB6 translation, 
confirming that BITG1 stem-loop-driven translation repression is 


a: 
G-c 0 T T aa. 


Non-targeting c-JUN Vector 


5’ UTRsurrounding the eIF3 PAR-CLIP site. e, Luciferase activity in vitro from 
mRNAs driven by a PSMB6 5’ UTR-BTG1 stem-loop chimaera. Rev SL, 
transversed stem-loop. The results of c and e are given as the mean + s.d. 

of three independent experiments, each performed in triplicate. 

f, g, Representative images of the effect of siRNA-mediated knockdown of 
c-JUN (f) or BTG1 overexpression (g) on Matrigel invasion by H1299 cells. As 
a control, cells were transfected with a non-targeting siRNA (f) or empty vector 
(g). Quantification of cell migration is presented in Extended Data Fig. 6. 
The results of f and g are representative of three independent experiments, each 
performed in duplicate. 


elF3-specific and not due to introduction of a potentially deleterious 
RNA secondary structure’? (Fig. 4e and Extended Data Fig. 5b). 
Together, these results demonstrate that during translation activation, 
elF3 recognizes the c-JUN mRNA by directly binding to a sequence in 
the context of a stem-loop structure; whereas during translation inhibi- 
tion, eIF3 binding to the BTG1 stem-loop requires the presence of 
additional factors or modifications. 

Although misregulation of eIF3 levels is implicated in carcinogenesis, 
it was previously unknown if eIF3 activities lead to these cell growth 
alterations**. Gene ontology analysis of the PAR-CLIP results establish 
direct binding of eIF3 to RNA targets enriched in cancer-associated cell 
growth regulation pathways, such as apoptosis, cell cycling and differen- 
tiation (Extended Data Fig. 6a). The combination of these targets may 
represent a gene program that supports overactive cell proliferation during 
elF3-related malignancies. In support of this, our results demonstrate 
that eIF3 acts as a positive translational regulator of c-JUN, which is a 
proto-oncogene required for RAS-mediated transformation”; and a negative 
regulator of BTG1, of which genomic deletions are found in 9% of B-cell 
precursor acute lymphoblastic leukaemias”. Furthermore, circumventing 
eIF3 translational control by knockdown of c-JUN or overexpression 
of BTG1 decreases cell invasiveness of H1299 human lung cancer cells, 
which overexpress elF3a” (Fig. 4f, g and Extended Data Fig. 6). Thus, 
we suggest that the RNAs identified by PAR-CLIP may be co-opted 
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upon eIF3 overexpression, leading to loss of correct translational con- 
trol of cell growth and eventual malignancy. 

Although it is surprising that eIF3 can act as both a repressor and 
activator of translation, analogous contrasting functions have been found 
with other multi-protein complexes. For example, the RNA polymerase 
II regulation complex Mediator consists of at least 30 proteins in humans. 
It directs either transcription activation or repression, dependent on promoter 
sequence, gene-specific regulatory proteins, and altered phosphoryla- 
tion states of subunits”’. Intriguingly, more than 25 posttranslational 
modifications have been detected on eIF3, with a number of them at 
substoichiometric levels”*”°, and eIF3 association with other translation 
regulatory proteins such as the helicase eIF4B is regulated by mitogenic 
signalling’*. Furthermore, modelling of the eIF3 subunits, except for eIF3d, 
reveals that the crosslinked subunits form a nexus in a distal region of eIF3 
positioned near the mRNA entry tunnel (Extended Data Fig. 7)'*. As 
the PAR-CLIP sites exhibit all variations of interactions with the four 
eIF3 subunits (Fig. 2c), we propose that there may be multiple modes of 
eIF3-RNA interactions driven by this region of eIF3 (refs 12, 13). 

Recent studies have highlighted that certain factors possess roles out- 
side of their general functions in translation. For example, the ribosome 
mediates translational specificity during development and viral infection 
through the requirement for distinct ribosomal proteins””*. During 
canonical translation, eIF3 acts as a protein scaffold for initiation com- 
plex assembly’. Our results now reveal a new paradigm for translational 
control, in which, in addition to this general function, eIF3 can act as 
both an activator and repressor of cap-dependent transcript-specific 
translation through direct binding to defined RNA structural elements. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Cells and transfections. Human 293T cells were maintained in DMEM (Invitrogen) 
with 10% FBS (Tissue Culture Biologicals). H1299 cells were maintained in RPMI- 
1640 ATCC-formulated (ATCC) with 10% FBS. IMR90 cells were maintained in 
Eagles MEM (ATCC) with 10% FBS. RNA transfections were performed using TransIT 
mRNA reagent (Mirius), with the following modifications to the manufacturer’s 
protocol. Twenty-four hours before transfection, 293T cells were seeded into opaque 
96-well plates to be at ~80% confluence at the time of transfection. For each well, 9 pl 
of pre-warmed OptiMEM (Invitrogen) was mixed with 90 ng of RNA, 0.27 ll of Boost 
reagent and 0.27 ul of TransIT mRNA reagent. Reactions were incubated for 3 min 
at room temperature, added drop-wise to the well, and luciferase activity was assayed 
18 h after transfection. Plasmid transfections were performed using Lipofectamine 
2000 (Invitrogen), according to the manufacturer's protocol, and Matrigel or western 
blot assays were performed 48 h after transfection. siRNA transfection was per- 
formed using Lipofectamine 2000, with the following modifications to the manu- 
facturer’s protocol. Forty-eight hours after transfection, cells were split into new 6-well 
plates to be at ~70% confluence 24 h after seeding. These cells were then transfected 
for a second time with siRNA, and harvested for Matrigel or western blot assays 48 h 
after the second transfection. 

Plasmids and siRNAs. To generate the c-JUN and BTG1 5’ UTRluciferase reporter 
plasmids, sections of the 5’ UTR were first amplified from human cDNA. These were 
then stitched together downstream of a T7 promoter using overlap-extension PCR 
and Gibson cloning. For the PSMB65' UTR luciferase reporter plasmid, the 5’ UTR 
was constructed by annealing primers together to create restriction-site-compatible 
overhangs. The 5’ UTRs were then inserted together with Renilla luciferase into 
pUC19 for c-JUN and pcDNA4 for BTG1 and PSMB6. The elF3-binding mutants 
and BTG1 stem-loop chimaeras were made by inserting annealed primers after 
cutting the plasmid with enzymes flanking the desired insertion site. The BTG1 
overexpression plasmid was constructed by inserting the BTG1 open reading 
frame, isolated by PCR from human cDNA, into pcDNA4 modified with a Kozak 
sequence”. siRNA pools used were siGENOME JUN (Dharmacon M-003268-03) and 
siGENOME Non-Targeting siRNA #3 (Dharmacon D-001210-03). 

Western blot. Western blot analysis was performed using the following antibodies: 
anti-eIF3a (Novus NBP1-18891); anti-eIF3b (Bethyl A301-761A); anti-eIF3c (Bethyl 
A300-376A); anti-eIF3d (Bethyl A301-758A); anti-eIF3e (Bethyl A302-985A); anti- 
elF3f (Bethyl A303-005A); anti-eIF3g antibody (Bethyl A301-757A); anti-eIF3h 
antibody (Bethyl A301-754A); anti-eIF3i (Biolegend 646701); anti-eIF3k (Novus 
NB100-93304); anti-eIF31 antibody (Genetex GTX120119); anti-eIF3m (Novus NBP1- 
56654); anti-rpS19 (Bethyl A304-002A); anti-eIF4G1 (Bethyl A301-775A); anti-c- 
JUN (Bethyl A302-959A); anti-BTG1 (Abcam ab151740). anti-GAPDH (Bethyl 
A300-640A); and anti-HSP90 (BD 610418). 

In vitro transcription. RNAs were made by in vitro transcription with T7 RNA poly- 
merase (NEB). For luciferase RNAs, transcription was performed in the presence 
of 3'-O-Me-m’G(5' )ppp(5')G RNA Cap Structure Analogue (NEB), using linearized 
plasmid as the template, and polyadenylated using polyA polymerase (Invitrogen). 
For gel shifts, annealed oligonucleotides were used as the template, and RNAs were 
radiolabelled by capping with vaccinia virus enzymes (NEB) and [a-*”P]-GTP. For 
SHAPE reactions, PCR templates were made using primers to add a 3’ handle (5'- 
GAACCGGACCGAAGCCCGGGCTGAG-3’), and transcription was performed 
using gel-extracted PCR products. RNAs were purified by phenol-chloroform extrac- 
tion and ethanol precipitation or using the RNA Clean and Concentrator Kit (Zymo). 
In vitro translation. In vitro translation extracts were made from 293T cells using 
a previously described protocol’’. Briefly, cells were trypsinized and collected by 
centrifugation for 5 min at 1,000g at 4°C. Cells were washed once with cold PBS 
(137 mM NaCl, 2.7mM KCl, 100 mM Na,HPO,, 2 mM KH>PO,) and an equal 
volume of freshly made cold lysis buffer (10 mM HEPES-KOH pH 7.6, 10 mM 
KOAc, 0.5 mM Mg(OAc)2, 5mM dithiothreitol (DTT), and 1 Complete EDTA- 
free Proteinase Inhibitor Cocktail tablet (Roche) per 10 ml of buffer) was added. 
After hypotonic-induced swelling for 45 min on ice, cells were homogenized using 
a syringe attached to a 27G needle until ~95% of cells burst, as monitored by trypan 
blue staining. Lysate was centrifuged at 14,000g for 1 min at 4 °C, and supernatant 
was moved to a new tube, avoiding the top lipid layer. Lysates were quickly frozen 
with liquid nitrogen and stored at —80 °C. Each translation reaction contained 
50% in vitro translation lysate and buffer to make the final reaction with 0.84 mM 
ATP, 0.21 mM GTP, 21 mM creatine phosphate (Roche), 45 U ml" creatine phos- 
phokinase (Roche), 10 mM HEPES-KOH pH 7.6, 2mM DTT, 2mM Mg(OAc)s, 
50 mM KOAc, 8 LM amino acids (Promega), 255 [WM spermidine and 1 U ll ~ ' murine 
RNase inhibitor (NEB). One millimolar m’G(5')ppp(5')G RNA cap structure analogue 
(NEB) was added to reactions when indicated. Translation reactions were incubated 
for 1h at 30 °C, after which luciferase activity was assayed. 

elF3 purification and native agarose gel electrophoresis. Recombinant elF3 was 
expressed and purified from Escherichia coli and native human elF3 was purified 
from HeLa cells as previously described”. The gel shift protocol was adapted from 
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previously described protocols****. A 0.7% agarose gel was prepared using Agarose 
Type 1B (Sigma A0576) in buffer consisting of 1 TBE supplemented with 75 mM 
KCl, and gel and buffer were pre-cooled at 4 °C. For each gel shift, 2 jul water, 1 pl of 
5X Binding Buffer (125 mM Tris-HCl pH 7.5, 25mM Mg(OAc)2, 350mM KCl, 
0.5mM CaCl, 0.5 mg ml 1 BSA, 10mM TCEP), 1 pl labelled RNA and 1 pl of 
purified eIF3 or protein buffer were added, in the listed order, and incubated at 
25 °C for 30 min. One microlitre of room temperature 6X non-denaturing loading 
dye (40% w/v sucrose, with xylene cyanol and bromophenol blue) was added to the 
reactions and these were loaded on the agarose gel. The gel was run for 1 hat 40 V at 
4°C, buffer was replaced with fresh cold buffer, and the gel was run for another 
hour at 40 V. The gel was placed on top of positively charged nylon membrane with 
four pieces of Whatman filter paper underneath, covered in saran wrap, and dried 
for 1h at 75°C ona pre-heated gel drier. The gel was imaged using a phosphoi- 
mager. 

SHAPE mapping of RNA structure. The SHAPE protocol was adapted from a 
previously described protocol”*. Each RNA folding reaction contained 1 ig of RNA, 
1.8 1 5X annealing buffer (500 mM HEPES-KOH pH 8.0, 250 mM KCl, 12.5 mM 
MgCl), and water to make the total reaction volume 9 11. RNAs were incubated at 
65°C for 5 min, ice for 5 min, and then at 25°C for 5 min. To each tube, 1 pl of 
100% dimethylsulfoxide (DMSO) or 1 il of 800 mM benzoyl cyanide (Sigma) was 
added, and the reaction was mixed by pipetting three times. The RNAs were 
immediately recovered by ethanol precipitation. Purified RNA was dissolved in 
9 wl of 0.5X TE buffer (5 mM Tris, 0.5 mM EDTA pH 8.0). Three microlitres of 
0.3 uM NED- or VIC-labelled primers were added to the modified and unmodified 
reactions, respectively. For sequencing reactions, 1 jtg of RNA in 1 pl volume was 
mixed with 8 pl of 0.5X TE buffer, and 3 pl of 0.3 UM FAM- or PET-labelled primers 
were added to each tube with 1 pl of 10 mM ddATP or ddTTP. To each tube, 7 pil of 
reverse transcription buffer (250mM KCl, 167mM Tris-HCl pH 8.3, 1.67 mM 
dNTPs, 17mM DTT and 10mM MgCl) was added and the reactions were pre- 
warmed to 52 °C for 1 min. One microlitre of Superscript III (Invitrogen) was added 
and the tubes were incubated at 52 °C for 50 min, 65 °C for 5 min, and then put on 
ice. The RNA was hydrolysed by adding 0.5 pl 10 N NaOH, heating to 95°C for 
3 min, put on ice, and then neutralized by adding 0.33 ll of 12.1M HCl. cDNAs 
were recovered by ethanol precipitation and resuspended in 11 il of deionized 
formamide. Fragment analysis was performed using an Applied Biosystems 3730XL 
DNA Analyzer, and raw traces were analysed using Shapefinder software”*. 
Matrigel invasion assay. Matrigel assays were performed using Corning BioCoat 
Matrigel invasion chambers according to the manufacturer's protocol. Twenty-four 
hours after seeding the invasion chambers, invaded cells were fixed with 70% ethanol 
and stained with crystal violet before imaging. 

PAR-CLIP. Three biological replicates of PAR-CLIP were performed as previously 
described’, with some modifications. For each experiment, 40-50 150 mm plates of 
293T cells were seeded to be at ~90% confluence during crosslinking. Fourteen 
hours before crosslinking, 4-thiouridine (Sigma) was added to the media to a final 
concentration of 100 1M. For crosslinking, the cells were washed with cold PBS 
and then the plates were irradiated on ice with 0.15 J cm™? of UV 365 nm light. The 
cells were scraped into PBS, pelleted by centrifugation at 1,000g for 5 min at 4°C, 
and the pellet was resuspended in three volumes of NP40 lysis buffer (50 mM HEPES- 
KOH pH 7.5, 150mM KCl, 2mM EDTA, 0.5% Nonidet P-40 alternative, 0.5 mM 
DTT, 1 Complete EDTA-free Proteinase Inhibitor Cocktail tablet per 50 ml of buffer). 
The cell suspension was incubated on ice for 10 min, passed through an 18G needle 
five times, and centrifuged at 13,000g for 15 min at 4 °C. The lysate was filtered through 
a 0.2 jum membrane syringe filter and RNAs were lightly digested by treatment with 
RNase T1 (Thermo Scientific) at a final concentration of 0.05 U ull! for 15 min at 
room temperature. For each plate, 5 pl of Dynabeads (Invitrogen) and 10 pil ofanti- 
elF3b antibody (Bethyl A301-761A) were prepared by washing the beads once with 
PBS and 0.2% Tween-20, and then allowing the antibody to bind to the beads in 
PBS and 0.2% Tween-20 by rotating at room temperature for 15 min. The antibody 
and beads were added to the lysates and the immunoprecipitation was rotated at 
4°C for 2h. 

The beads were collected and washed three times in high-salt NP40 wash buffer 
(50 mM HEPES-KOH pH 7.5, 500 mM KCl, 0.5% Nonidet P-40 alternative, 0.5 mM 
DTT, 1 Complete EDTA-free Proteinase Inhibitor Cocktail tablet per 50 ml of buffer). 
One bead volume of NP40 lysis buffer and 50 U pl’ RNase T1 was added to the beads 
and incubated for 16 min at room temperature. Beads were washed three times in 
high-salt NP40 wash buffer and resuspended in one bead volume of Buffer 3 (NEB) 
with 0.5 U pl? Calf Intestinal Phosphatase (NEB). The reaction was incubated at 
37 °C for 10 min, and beads were washed twice in phosphatase wash buffer (50 mM 
Tris-HCl pH 7.5, 20 mM EGTA, 0.5% v/v Nonidet P-40 alternative) and twice in 
PNK buffer without DTT (50 mM Tris-HCl pH 7.5, 50 mM NaCl, 10 mM MgCl,). 
Beads were resuspended in one bead volume of PNK buffer with 0.5 uCipl* 
[y-*P]-ATP and 1 Ul! T4 PNK (NEB), and incubated for 20 min at 37 °C. One- 
hundred micromolar nonradioactive ATP was added and the reaction was incu- 
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bated for 5 min at 37 °C, and then beads were washed five times with PNK buffer 
without DTT. SDS-PAGE loading dye (50 mM Tris-HCl pH 6.8, 100 mM B-mer- 
captoethanol, 2% w/v SDS, 10% v/v glycerol, 0.1% bromophenol blue) was added 
to the beads, the sample was boiled for 5 min, and the sample was loaded onto a Bis- 
Tris 4-12% Bis-Tris gel (Novex) and electrophoresed in MOPS buffer (2.5 mM MOPS, 
2.5 mM Tris base, 0.005% w/v SDS, 1 mM EDTA). Asa size standard, native eIF3 was 
loaded onto the same gel. 

The gel was imaged using a phosphoimager, a printed image was aligned to the 
gel, and the complexes were excised and electroeluted in a D Tube Dialyzer Midi 
(Millipore) for 2.5 h at 150 V, at 4°C. The protein was digested with 1.2 mg ml”? 
Proteinase K (Roche) in Proteinase K buffer (50 mM Tris-HCl pH 7.5, 75 mM 
NaCl, 6.25 mM EDTA, 1% w/v SDS) for 30 min at 37 °C. The RNA was isolated by 
phenol-chloroform extraction and ethanol precipitation, and small RNA libraries 
were prepared using a standard protocol’’. The cDNA libraries were sequenced on 
an Illumina HiSeq 2000. 

Mass spectrometry. Protein samples were prepared alongside the sequencing samples 
used for RNA library preparation, using five plates and substituting nonradioactive 
ATP during the T4 PNK labelling step. The samples were run on the same gel as the 
radiolabelled PAR-CLIP samples and cut out using the phosphorimager printout 
as a guide. Mass spectrometry samples were prepared by in-gel tryptic digestion*® and 
peptides were identified by liquid chromatography-mass spectrometry (LC-MS). 
Denaturing immunoprecipitation. The denaturing immunoprecipitation was 
performed using the PAR-CLIP protocol, with the following alterations. Five plates 
were used for each sample and, after crosslinking, one volume of NP40 lysis buffer 
was added and the sample was incubated on ice for 10 min. The lysate was clarified 
by centrifugation, the supernatant was transferred to a new tube, and one volume of 
2X SDS lysis buffer (10% w/v SDS, 100 mM Tris-HCl pH 7.4, 10 mM EDTA, 20 mM 
DTT) was added. The sample was boiled for 5 min, cooled on ice, and then diluted 
at least tenfold with nondenaturing lysis buffer (1% v/v Triton-X-100, 50 mM HEPES- 
KOH pH 7.5, 150 mM NaCl, 2mM EDTA). Immunoprecipitation was performed 
using an anti-eIF3d (Bethyl A301-758A) or anti-elF3g (Bethyl A301-757A) antibody. 
PAR-CLIP computational analysis. Raw Illumina reads were collapsed using 
fastx_collapser from FASTX Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/), and 
3’ adapters were removed using Cutadapt (http://code.google.com/p/cutadapt/). Reads 
shorter than 15 nucleotides were discarded. To remove processed reads that align 
to repeat elements, reads were mapped using Bowtie* to the hg19 RepeatMasker 
track from the UCSC table browser, and unmapped reads were retained. Retained 
reads were mapped to the hg19 reference genome, allowing for up to two mismatches 
in alignment. PARalyzer’ was used to identify read clusters, or eIF3 crosslinking sites, 
with settings of five minimum read counts per group, cluster or kernel density 
estimation (KDE), a minimum cluster size of 11 nucleotides, and a minimum conver- 
sion count of 1. Clusters were annotated using iterative rounds of bedtools intersect”, 
with the following hierarchy: start codon, stop codon, 5’ UTR, 3’ UTR, CDS, intron, 
lincCRNA, miRNA, piwiRNA, snoRNA, snRNA, mitoRNA, rRNA, pseudogene, miscRNA. 
Annotation data were from the following sources: Gencode (v17 annotation) (http:// 
www.gencodegenes.org), Ensembl BioMart, ncRNA database (http://www.ncrna.org/), 
miRBase (http://www.mirbase.org). Clusters that aligned to intergenic regions or that 
were antisense to the transcript were removed, along with any clusters that mapped 
identically but in the correct sense, as these are probably due to incorrect mapping”. 
Next, the consensus set of clusters was defined as a cluster that was reproduced in 
at least two of the three biological replicates*', and this was determined using the 
Bioconductor GenomicFeatures package. 

RNA-sequencing. Two biological replicates of RNA-sequencing (RNA-seq) were 
performed as following. Polyadenylated mRNAs were isolated from 5 X 10° 293T 
cells using the mRNA-DIRECT kit (Ambion). For alkaline hydrolysis fragmentation, 
250 ng of mRNA was mixed with 5X fragmentation buffer (150 mM Mg(OAc),, 
200 mM Tris- Acetate pH 8.3, 500 mM KOAc) ina total volume of 20 pl and heated 
at 94 °C for 6 min. The RNA was ethanol precipitated, and first-strand cDNA was 
synthesized using random hexamers and Superscript III (Invitrogen), according to 
the manufacturer’s protocol. For the following, the cDNA was purified between each 
step using a PCR purification column (Qiagen). To make second-strand cDNA, 
10 pl 10X ligase buffer (NEB), 0.3 mM dNTP mix, 67 U ml! E.coliDNA Ligase (NEB), 
267 Uml ' E. coli DNA and 13.4U ml | RNase H (Invitrogen) was added to the 
cDNA in a 100 pl reaction and incubated for 2.5h at 16 °C. The cDNA was end 
repaired in a 100 pl reaction with 0.4 mM dNTPs, 10 pl 10x T4 DNA ligase buffer 
(NEB), 150 Uml~' T4 DNA Polymerase (NEB), 50 U ml”! T4 PNK (NEB) and 
50U ml | Klenow (NEB) for 30 min at 20 °C. To A-tail the cDNA, 2 mM dATP, 


32 ull cDNA, 5 pl 10X NEBuffer 2 (NEB) and 300 U ml ' Klenow exo (3’ to 5’ exo 
minus) (NEB) was added to the cDNA ina 50 pl reaction, and incubated for 30 min at 
37°C. To prepare adapters, 40 [1M universal adaptor and 40 uM indexed adaptor 
were mixed with 2 ul 10X primer annealing buffer (100 mM Tris-HCl pH 8.0, 50 mM 
NaCl) in a 20 pl reaction and heated at 95 °C for 15 min, 70 °C for 15 min, and slow 
cooled to room temperature. Adapters were ligated in a 50 jl reaction with 5 ul 10x T4 
DNA ligase buffer (NEB), 16,000 U ml 'T4DNA ligase (NEB) and 2 11M adapters 
for 15 min at room temperature. cDNA libraries were amplified by PCR using Phusion 
(NEB) and 5 11M primer mix for 15 cycles of 10s at 98 °C, 30s at 65 °C and 30s at 
72 °C, and isolated by gel purification (Qiagen). The cDNA libraries were sequenced 
using an Illumina HiSeq 2000. The following oligonucleotides were used. Universal 
adaptor: 5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACA 
CGACGCTCTTCCGATC*T-3’ (asterisk denotes a phosphorothioate bond); indexed 
adaptor: 5'-/5Phos/GATCGGAAGAGCACACGTCTGAACTCCAGTCAC-index- 
ATCTCGTATGCCGTCTTCTGCTTG-3’, with the index sequence being CGATGT 
or TGACCA; primer mix: 5'-AATGATACGGCGACCACCGAGATCTACACT 
CTTTCCCTACACGA-3’ and 5'-CAAGCAGAAGACGGCATACGAGAT-3". 
RNA-seq computational analysis. After quality filtering with the FASTX Toolkit, 
reads were mapped to the human hg19 genome using Tophat” and the Gencode (v17) 
annotation. FPKM was calculated using a python script, and the average FPKM 
was calculated using the two biological replicates. 

RNA immunoprecipitation and RT-PCR. Two 150 mm plates of 293T cells were 
lysed in three volumes of NP40 lysis buffer. Dynabeads were prepared with rabbit 
IgG (Cell Signaling 2729), rabbit anti-HA antibody (Invitrogen 71-5500) or rabbit 
anti-eIF3b antibody (Bethyl A301-761A). The lysate was split into three parts, the 
different antibody-Dynabead mixtures were added, and the suspension was incu- 
bated for 2h at 4°C. The beads were washed four times with high NP40 wash 
buffer (50 mM HEPES-KOH pH 7.5, 500 mM KCI, 2mM EDTA, 1% Nonidet P-40 
alternative, 0.5 mM DTT), and bound RNAs were isolated by phenol-chloroform 
extraction and ethanol precipitation. cDNA was reverse transcribed using random 
hexamers and Superscript III, and PCR was performed using Phusion. The following 
oligonucleotides were used. RANGAP1-Forward, 5'-ACCGTCTGGAGAATGAT 
GG-3'; RANGAP1-Reverse, 5’-CGCAAGGTCTTCAAGGTCTC-3’; JUN-Forward, 
5'-TGACTGCAAAGATGGAAACG-3’; JUN-Reverse, 5'-CCGTTGCTGGACT 
GGATTAT-3’; BTG1-Forward, 5'-CACTGGTTCCCAGAAAAGC-3'; BTG1- 
Reverse, 5’-CTACCATTTGCACGTTGGTG-3’; PPP3R1-Forward, 5'-GAATTC 
ATTGAGGGCGTCTC-3’; PPP3R1-Reverse, 5’-GCCACCTACAACAGCACAG 
A-3'; CDK12-Forward, 5’-CAAATTCTCAGCCCCCTGTA-3’; CDK12-Reverse, 
5'-GAGGTGGTGTGATTGCCTTT-3’; PSMB6-Forward, 5'-ACTGGGAAAGCC 
GAGAAGTT-3’; PSMB6-Reverse, 5’-TCCCGGTAGGTAGCATCAAC-3’. 
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elF3a (NP_003741) 
MPAYFQRPENALKRANEFLEVGKKQPALDVLYDVMKSKKHRTWQKIHEPIMLKYLELCVD 
LRKSHLAKEGLYQYKNICQQVNIKSLEDVVRAYLKMAEEKTEAAKEESQQMVLDIEDLDNI 
QTPESVLLSAVSGEDTQDRTDRLLLTPWVKFLWESYRQCLDLLRNNSRVERLYHDIAQQA 
FKFCLQYTRKAEFRKLCDNLRMHLSQIQRHHNQSTAINLNNPESQSMHLETRLVQLDSAIS 
MELWQEAFKAVEDIHGLFSLSKKPPKPQLMANYYNKVSTVFWKSGNALFHASTLHRLYHL 
SREMRKNLTQDEMQRMSTRVLLATLSIPITPERTDIARLLDMDGIIVEKQRRLATLLGLQAP 
PTRIGLINDMVRFNVLQYVVPEVKDLYNWLEVEFNPLKLCERVTKVLNWVREQPEKEPEL 
QQYVPQLONNTILRLLQQVSQIYQSIEFSRLTSLVPFVDAFQLERAIVDAARHCDLQVRIDH 
TSRTLSFGSDLNYATREDAPIGPHLQSMPSEQIRNQLTAMSSVLAKALEVIKPAHILQEKEE 
QHQLAVTAYLKNSRKEHQRILARRQTIEERKERLESLNIQREKEELEQREAELQKVRKAEE 
ERLRQEAKEREKERILQEHEQIKKKTVRERLEQIKKTELGAKAFKDIDIEDLEELDPDFIMAK 
QVEQLEKEKKELQERLKNQEKKIDYFERAKRLEEIPLIKSAYEEQRIKDMDLWEQQEEERIT 
TMQLEREKALEHKNRMSRMLEDRDLFVMRLKAARQSVYEEKLKQFEERLAEERHNRLEE 
RKRQRKEERRITYYREKEEEEQRRAEEQMLKEREERERAERAKREEELREYQERVKKLE 
EVERKKRQRELEIEERERRREEERRLGDSSLSRKDSRWGDRDSEGTWRKGPEADSEWR 
RGPPEKEWRRGEGRDEDRSHRRDEERPRRLGDDEDREPSLRPDDDRVPRRGMDDDR 
GPRRGPEEDRFSRRGADDDRPSWRNTDDDRPPRRIADEDRGNWRHADDDRPPRRGLD 
EDRGSWRTADEDRGPRRGMDDDRGPRRGGADDERSSWRNADDDRGPRRGLDDDRG 
PRRGMDDDRGPRRGMDDDRGPRRGMDDDRGPRRGLDDDRGPWRNADDDRIPRRGA 
EDDRGPWRNMDDDRLSRRADDDRFPRRGDDSRPGPWRPLVKPGGWREKEKAREESW 
GPPRESRPSEEREWDREKERDRDNQDREENDKDPERERDRERDVDREDRFRRPRDEG 
GWRRGPAEESSSWRDSSRRDDRDRDDRRRERDDRRDLRERRDLRDDRDRRGPPLRS 
EREEVSSWRRADDRKDDRVEERDPPRRVPPPALSRDRERDRDREREGEKEKASWRAE 
KDRESLRRTKNETDEDGWTTVRR 


elF3b (NP_001032360) 
MQDAENVAVPEAAEERAEPGQQQPAAEPPPAEGLLRPAGPGAPEAAGTEASSEEVGIAE 
AGPESEVRTEPAAEAEAASGPSESPSPPAAEELPGSHAEPPVPAQGEAPGEQARDERSD 
SRAQAVSEDAGGNEGRAAEAEPRALENGDADEPSFSDPEDFVDDVSEEELLGDVLKDRP 
QEADGIDSVIVVDNVPQVGPDRLEKLKNVIHKIFSKFGKITNDF YPEEDGKTKGYIFLEYASP 
AHAVDAVKNADGYKLDKQHTFRVNLFTDFDKYMTISDEWDIPEKQPFKDLGNLRYWLEEA 
ECRDQYSVIFESGDRTSIFWNDVKDPVSIEERARWTETYVRWSPKGTYLATFHQRGIALW 
GGEKFKQIQRFSHQGVQLIDFSPCERYLVTFSPLMDTQDDPQAIIIWDILTGHKKRGFHCE 
SSAHWPIFKWSHDGKFFARMTLDTLSIYETPSMGLLDKKSLKISGIKDFSWSPGGNIIAFW. 
VPEDKDIPARVTLMQLPTRQEIRVRNLFNVVDCKLHWQKNGDYLCVKVDRTPKGTQGVVT 
NFEIFRMREKQVPVDVVEMKETIIAFAWEPNGSKFAVLHGEAPRISVSFYHVKNNGKIELIK 
MFDKQQANTIFWSPQGQFVVLAGLRSMNGALAFVDTSDCTVMNIAEHYMASDVEWDPT 
GRYVVTSVSWWSHKVDNAYWLWTFQGRLLQKNNKDRFCQLLWRPRPPTLLSQEQIKQIK 
KDLKKYSKIFEQKDRLSQSKASKELVERRRT MMEDFRKYRKMAQELYMEQKNERLELRG 
GVDTDELDSNVDDWEEETIEFFVTEEIIPLGNQE 


elF3d (NP_003744) 
MAKFMTPVIQDNPSGWGPCAVPEQFRDMPYQPFSKGDRLGKVADWTGATYQDKRYTNK 
YSSQFGGGSQYAYFHEEDESSFOLVDTARTQKTAYQRNRMRFAQRNLRRDKDRRNMLQ 
FNLQILPKSAKQKERERIRLQKKFQKQFGVRQKWDQKSQKPRDSSVEVRSDWEVKEEM 
DFPQLMKMRYLEVSEPQDIECCGALEYYDKAFDRITTRSEKPLRSIKRIFHTVTTTDDPVIR 
KLAKTQGNVFATDAILATLMSCTRSVYSWDIVVQRVGSKLFFDKRDNSDFDLLTVSETANE 
PPQDEGNSFNSPRNLAMEATYINHNFSQQCLRMGKERYNFPNPNPFVEDDMDKNEIASV 
AYRYRRWKLGDDIDLIVRCEHDGVMTGANGEVSFINIKTLNEWDSRHCNGVDWRQKLDS 
QRGAVIATELKNNSYKLARWTCCALLAGSEYLKLGYVSRYHVKDSSRHVILGTQQFKPNE 
FASQ i N ILSVENAWGILRCVIDICMKLEEGKYLILKDPNKQVIRVYSLPDGTFSSDEDEEEEE 


elF3g (NP_003746) 
MPTGDFDSKPSWADQVEEEGEDDKCVTSELLKGIPLATGDTSPEPELLPGAPLPPPKEVI 
NGNIKTVTEYKIDEDGKKFKIVRTFRIETRKASKAVARRKNWKKFGNSEFDPPGPNVATITV 
SDDVSMTFITSKEDLNCQEEEDPMNKLKGQKIVSCRICKGDHWTTRCPYKDTLGPMQKE 
LAEQLGLSTGEKEKLPGELEPVQATQNKTGKYVPPSLRDGASRRGESMQPNRRADDNAT 
IRVINLSEDTRETDLQELFRPFGSISRIYLAKDKTTGQSKGFAFISFHRREDAARAIAGVSG 


b IP: elF3d © IP: elF3g 
kDa 5 kDa 
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Extended Data Figure 1 | PAR-CLIP reveals eIF3a, b, d and g bind to RNA. 


a, Mass spectrometry identification of trypsin-released peptides from RNA- 
crosslinked eIF3 subunits. Peptides identified by mass spectrometry are 


highlighted in pink. b, c, Crosslinking and denaturing immunoprecipitation to 


validate subunit identification. As eIF3d and g co-migrate with eIF3] and e/f, 
respectively, subunit identification was validated by immunoprecipitation of 
individual proteins after crosslinking and treatment of lysates with SDS 
treatment and boiling. 
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Extended Data Figure 2 | Analysis of eIF3 PAR-CLIP targets. a, Scatterplot _ in red. b, Scatterplot of correlation between mRNA expression and PAR-CLIP 
of fragments per kilobase of exon per million reads (FPKM) of all mRNAs read coverage for mRNAs that are eIF3 PAR-CLIP targets. The simple linear 
expressed in 293T cells. mRNAs that are eIF3 PAR-CLIP targets arehighlighted _ regression line is plotted in blue, with the 95% confidence region shaded in grey. 
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Extended Data Figure 3 | Conservation of c-JUN and BTG1 elF3-binding cat (XM_006934825.1, Felis catus). b, BTG1 GenBank accessions are: human 
sites in primates and mammals. a, b, The elF3-binding site is indicated in (NM_001731.2, Homo sapiens), chimpanzee (XM_509262.3, Pan 
cyan. nt, nucleotides. a, c-JUN GenBank accessions are: human (NM_002228.3, _ troglodytes), orangutan (XM_002823578.2, Pongo abelii), rhesus macaque 


Homo sapiens), chimpanzee (KXM_513442.5, Pan troglodytes), gorilla (NM_001266672.1, Macaca mulatta), marmoset (XM_002752814.3, Callithrix 
(XM_004025880.1, Gorilla gorilla), orangutan (XM_002810763.3, Pongo jacchus), mouse (NM_007569.2, Mus musculus), cat (XM_006933950.1, Felis 
abelii), rhesus macaque (NM_001265850.2, Macaca mulatta), marmoset catus), cow (NM_173999.3, Bos taurus). 


(XM_002750880.3, Callithrix jacchus), mouse (NM_010591.2, Mus musculus), 
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Extended Data Figure 4 | Interactions between native and recombinant interaction between native (Nat) and recombinant (Rec) eIF3 and the wild-type 


eIF3 and the c-JUN and BTG1 RNA stem-loops. a, Coomassie blue staining © (WT) c-JUN stem-loop structure, but not with the mutated stem-loop or the 
of purified native HeLa eIF3 or recombinant eIF3, resolved by SDS-PAGE. wild-type BTG1 stem-loop. 
b, Representative native agarose gel electrophoresis shows a specific and binary 
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Extended Data Figure 5 | Luciferase activity of c-JUNand BTGI mutants in 
cells. a, b, Luciferase activity in 293T cells transfected with mRNAs containing 
the c-JUN 5' UTR with a mutated stem-loop (a) or the PSMB6 5’ UTR-BTG1 
stem-loop chimaera (b). Mut, mutant; Rev, transversed; SL, stem-loop; WT, 
wild type. The results are given as the mean + s.d. of three independent 
experiments, each performed in triplicate. 
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Extended Data Figure 6 | Bypassing eIF3 translational control in H1299 
cells reduces cell invasiveness. a, Functional classification of e[F3-bound 
RNAs. b, Representative western blot analysis of eIF3a expression levels in 
H1299 and IMR90 cells. GAPDH was detected as a loading control for 
normalized protein levels. c, Representative image of Matrigel invasion by 
H1299 or IMR90 cells. d, BTG1 protein levels after overexpression in H1299 
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cells. HSP90 was detected as a loading control. e, Matrigel invasion assay in 
H1299 cells after overexpression of BTG1. ORF, open reading frame. f, c-JUN 
protein levels after siRNA-mediated knockdown in H1299 cells. NT, non- 
targeting. g, Matrigel invasion assay in H1299 cells after knockdown of c-JUN. 
The results of e and g are given as the mean ~ s.d. of three independent 
experiments, each performed in duplicate. 
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RNA-binding 
subunits 


Extended Data Figure 7 | Schematic of eIF3 subunit localization on the 
small ribosomal subunit. The eIF3 subunits bound to RNA in the PAR-CLIP 
experiment, elF3a, b and g, form a nexus in the distal eIF3 region. The location 
of eIF3d has not been assigned, and the schematic is adapted from ref. 14. 


©2015 Macmillan Publishers Limited. All rights reserved 


Mae A lea 


doi:10.1038/nature14268 


New cosmogenic burial ages for Sterkfontein 
Member 2 Australopithecus and Member 5 Oldowan 


Darryl E. Granger', Ryan J. Gibbon’, Kathleen Kuman*"*, Ronald J. Clarke*, Laurent Bruxelles*° & Marc W. Caffee"’ 


The cave infills at Sterkfontein contain one of the richest assem- 
blages of Australopithecus fossils in the world, including the nearly 
complete skeleton StW 573 (‘Little Foot’)'“ in its lower section, as 
well as early stone tools” in higher sections. However, the chronology 
of the site remains controversial* ‘* owing to the complex history of 
cave infilling. Much of the existing chronology based on uranium- 
lead dating’®" and palaeomagnetic stratigraphy*”” has recently been 
called into question by the recognition that dated flowstones fill 
cavities formed within previously cemented breccias and therefore 
do not form a stratigraphic sequence*™. Earlier dating with cosmo- 
genic nuclides’ suffered a high degree of uncertainty and has been 
questioned on grounds of sediment reworking’*"”"’. Here we use iso- 
chron burial dating with cosmogenic aluminium-26 and beryllium-10 
to show that the breccia containing StW 573 did not undergo signi- 
ficant reworking, and that it was deposited 3.67 + 0.16 million years 
ago, far earlier than the 2.2 million year flowstones found within 
it’®". The skeleton is thus coeval with early Australopithecus afar- 
ensis in eastern Africa’*'®. We also date the earliest stone tools at 
Sterkfontein to 2.18 + 0.21 million years ago, placing them in the 
Oldowan at a time similar to that found elsewhere in South Africa 
at Swartkans’” and Wonderwerk”*. 

The cave at Sterkfontein is partly filled with overlapping layers of 
fossiliferous breccia’’”° that entered through multiple openings to the 
surface. The infill was originally divided into six members thought to 
be in stratigraphic order!’, with Members 1-3 inside the cave and 4-6 
now exposed at the surface owing to erosion of the cave roof. Although 
the complete infill stratigraphy is not exposed in any one place and the 
temporal relationship between the interior and surface deposits remains 
debated’*"*, we retain the original nomenclature’’”’ here. We will focus 
on Member 2 within the Silberberg Grotto (Fig. 1) and on the 
Oldowan Infill of Member 5 in younger deposits excavated from a 
higher infill. 

Member 2 contains abundant fossils, angular dolomite and chert 
clasts, and quartz-bearing sand. Several localized flowstones and bot- 
ryoidal calcite deposits fill cavities that formed after the breccia was 
cemented and later settled into voids dissolved below (Fig. 1)*"*. Fauna 
was accumulated as a deathtrap assemblage”' including associated ele- 
ments, largely of primates and carnivores, with no hominids apart from 
a single near-complete skeleton of Australopithecus prometheus (StW 
573; Fig. 2)'*??. This species was named on the basis of a parieto- 
occipital fossil from Makapansgat”’. It has been suggested” that sev- 
eral other Sterkfontein and some Makapansgat specimens also belong 
in this species making Australopithecus africanus and A. prometheus 
contemporaries in the assemblages of Makapansgat Member 3 and 
Sterkfontein Member 4. A. prometheus differs from A. africanus in fea- 
tures including Paranthropus-like larger, bulbous-cusped cheek teeth, 
a longer, flatter face, incipient supraglabellar hollowing and a more 
vertical rounded occiput”. (Note that we use the term hominid in the 


traditional sense to include humans and their ancestral relatives but 
exclude the great apes.) 

Dating of Member 2 and StW 573 has been problematic. Flowstones 
in the vicinity of StW 573 date to about 2.2 million years (Myr)'*", but 
they post-date the breccia and the fossil*"*. The only previous date on 
the breccia itself was cosmogenic *°Al/""Be burial dating of fine-grained 
quartz’, which yielded a best-fit age of 4.17 + 0.35 Myr. This age has 
been questioned by many’* '*** who have suggested that fine sediment 
could have been reworked from older, higher deposits within the cave, 
making the burial age of the sediment older than the fossil. To resolve 
the age of the fossil the breccia must be dated and it must be shown to 
bea coherent stratigraphic unit, largely free of reworked material. This 
is now possible owing to improvements in measurement precision and 
new techniques such as isochron burial dating which can explicitly 
validate the coeval deposition of the entire unit”*”’. 

Member 5 contains both Homo ergaster and Paranthropus fossils as 
well as Oldowan and Acheulean stone tools*’. Member 5 East is divided 
into a lower Oldowan infill, with the first appearance of stone tools and 
a few fossils of Paranthropus, and an overlying early Acheulean infill’. 
Faunal comparisons and the Paranthropus hominid StW 566 suggested 
an age estimate of 1.7—2.0 Myr for the Oldowan infill®’. A substantially 
younger age of 1.32+0.08 Myr (error-weighted mean) has been 
inferred from electron spin resonance dating of bovid teeth'’. We 
use burial dating of a quartz manuport to determine the age of the 
Oldowan infill. 

Burial dating is based on the radioactive decay of *°Al and '’Be in 
quartz. These nuclides build up by exposure to secondary cosmic radi- 
ation near the ground surface, and subsequently decay when sediment 
is buried and cosmogenic nuclide production is attenuated. Because 
2A] (to = 1.021 + 0.024 Myr (ref. 28)) decays faster than Be 
(T19 = 2.005 + 0.020 Myr (ref. 29)), the ratio 76 al/'°Be decreases over 
time, with an effective mean-life of Thur = 2.08 + 0.10 Myr. For burial 
dating to be accurate, three criteria must be met. (1) The quartz must be 
exposed near the ground surface before burial to accumulate sufficient 
?°Al and '°Be. (2) It must be buried quickly and deeply enough so that 
post-burial production is small. The exact depth required depends 
upon the inherited concentrations, but is usually many metres. (3) It 
must be buried only once in the past ~10 Myr. If quartz has been 
reworked from older deposits, or if it has been reworked underground 
within the cave system, then the burial age will overestimate the true 
age of the deposit. 

An elegant way to test whether the burial dating criteria are met is to 
construct an isochron**”’ in which multiple samples are analysed from 
the same location. Each sample is buried with its own inherited *°Al and 
‘Be concentrations, but all samples share the same post-burial produc- 
tion history. A plot of “°Al versus '"Be yields a gentle curve with a slope 
that indicates burial age and an intercept that depends on the amount of 
post-burial production™. The isochron burial dating method accounts 
for post-burial production without requiring detailed knowledge of 
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Figure 1 | Stratigraphy and sample locations. Measured stratigraphic 
section through the Member 2 talus at the location of the StW 573 skeleton 
showing locations of dated samples, modified from ref 14. Locations of U/Pb 
samples are estimated from schematic sections of refs 10, 11; palaeomagnetic 


the burial depth or burial history. It also allows outliers to be identified; 
reworked samples plot below the isochron, while samples significantly 
above the isochron are forbidden and indicate issues with either the 
sample or the laboratory measurements. 

We analysed 11 samples from Member 2 (Table 1), including three 
previously reported’. Effective isochron burial dating requires a wide 
range of inherited cosmogenic nuclide concentrations. To that end, we 
selected a suite of samples to maximize variability. Fine quartz sand from 
multiple samples (ST 1-9) was probably washed in from the surface. In 
contrast, four blocks of chert were collected from the immediate vicinity 
of StW 573 (M2CA-D). Two fractions of coarse sand and pebbles were 
separated (ST M2 Dark and Light). One fraction comprises rounded 
grains stained with pedogenic iron oxides and washed into the cave from 
soil at the surface; the second comprises angular unstained grains prob- 
ably eroded from the walls and ceiling of the cave itself (Extended Data 
Fig. 1). A previously reported sample from the modern surface’ was 
analysed to confirm that material enters the cave with a zero burial age. 

From the Oldowan Infill of Member 5 we selected a single quartz 
manuport—a typical vein quartz cobble with rounding and impact 
marks characteristic of rocks found in the local river gravels close to 
Sterkfontein (Extended Data Fig. 2). There is no evidence for reworking 
of older deposits, as there are no diagnostically younger artefacts within 
the large Oldowan assemblage of 3,500 pieces®”. 

© Aland '°Be were measured by accelerator mass spectrometry (AMS). 
All samples of fine sand and the iron-stained grains have high *°Al and 
‘Be concentrations, confirming their origin from outside the cave. Light- 
coloured grains and chert blocks have low concentrations, indicating 
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samples were located from refs 8, 12. Inset locates the cross section in the lower 
part of the Silberberg Grotto, with approximately 1 m contour intervals for the 
infill surface. 


that they were probably eroded from the walls of the cave within a few 
metres of the surface. A plot of *°Al versus !"Be (Fig. 3) reveals that all 
but two of the samples lie on an isochron, consistent with a single epi- 
sode of deposition. One chert block lies below the isochron, indicating 


Figure 2 | Skull of StW 573 (‘Little Foot’). The skull, recently extracted from 
the cave breccia. Photo by Jason Heaton. 


©2015 Macmillan Publishers Limited. All rights reserved 


Table 1 | Samples and cosmogenic nuclide concentrations 


LETTER 


Sample Location [2°Be] (10° atoms per gram)* [?°Al] (10° atoms per gram) 
ST1 0.7 m below StW 573 0.493 + 0.026 0.624 + 0.053 
ST2 Adjacent to StW 573 0.574 + 0.025 0.565 + 0.050 
ST3 0.8m above StW 573 0.522 + 0.029 0.562 + 0.122 
ST7 Surface above cave 1.166 + 0.020 7.075 + 0.380 
ST8 0.7 m NW of StW 573 0.685 + 0.137 0.686 + 0.074 
ST9 2-2.5 m below StW 573 0.479 + 0.012 0.550 + 0.036 

ST M2 Dark From samples ST 1, 2, 8, 9 0.354 + 0.025 0.412 + 0.044 
ST M2 Light From samples ST 1, 2 0.118 + 0.005 0.205 + 0.015 

M2CA Near StW 573 0.101 + 0.004 0.099 + 0.009 

M2CB Near StW 573 0.070 + 0.004 0.179 +0.015 

M2CC Near StW 573 0.043 + 0.002 0.083 + 0.012 

M2CD Near StW 573 0.157 + 0.006 0.955 + 0.036 

Manuport Oldowan Infill, Member 5 1.623 + 0.070 3.051 +0.295 


*All }°Be measurements adjusted to the standard of ref. 30. ST 1-3 are slightly different than reported in ref. 9 owing to inclusion of additional analyses. 


that it was reworked from an older deposit within the cave, perhaps 
from talus of Member 1, nearby. Another chert sample has a 26 A1/)°Be 
ratio far into the forbidden zone above the isochron, indicating a prob- 
lem. Because this was a small sample there is no remaining chert for 
re-analysis; it is not included in the age determination. 

The burial age for Member 2 is calculated as 3.67 + 0.16 Myr. The 
concentration of “Be produced after burial is calculated as (21 + 3) X 10° 
atoms per gram, corresponding to a post-burial production rate of 
about 0.012 atoms of '°Be per gram per year, a value consistent with 
deep burial. The burial age of the surface sample is 0.11 + 0.11 Myr, 
consistent with zero. Its concentrations indicate a surface erosion rate 
of 5.5+0.5m Myr | for ‘°Be and 6.0 + 0.6m Myr * for *°Al. 

Several factors have contributed to lowering the age of Member 2 
from that previously reported for sample ST 2 (4.17 + 0.35 Myr)”, even 
though its ‘Be and *°Al concentrations did not change substantially. 
Since the time of the previous publication the mean-life of '°Be has been 
re-evaluated and raised from 1.93 Myr to 2.005 Myr (ref. 29), decreas- 
ing the burial age. In addition, post-burial production by muons was 
previously overestimated, making the inferred burial age too old. 
Although production rates by muons at depth have been revised’’, 
the isochron method explicitly solves for post-burial production and 
avoids the need for theoretical production rate calculations, making the 
method inherently more robust. Finally, rather than relying ona single 
sample, the new calculations consider nine samples simultaneously; 
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Figure 3 | Burial dating isochron. Cosmogenic *°Al and '°Be concentrations 
for individual samples from Member 2, shown as 1a error ellipses. The solid 
curve shows the error-weighted best fit, and dashed curves illustrate 1o 

error bounds. One sample shown as an open symbol lies below the isochron and 
has been reworked from an older deposit. A single outlier lies far above the line 
and has been excluded from analysis. The remaining nine samples are all 
consistent with a single age of deposition at 3.67 + 0.16 Myr ago. 


using revised values sample ST 2 alone would yield an age of 
3.94 + 0.20 Myr, older than but well within measurement uncertainty 
of the joint solution. 

The new age of the Member 2 breccia and the StW 573 skeleton 
encased within it is in accordance with stratigraphic and taphonomic 
data™* suggesting that they are older than Member 4 with its abundant 
Australopithecus fossils. StW 573 thus represents an earlier individual 
that is older than similar fossils from Makapansgat and contemporary 
with some A. afarensis fossils such as at Laetoli’, and a partial skeleton 
from Woranso-Mille, Ethiopia’®. The demonstration that A. prometheus 
in South Africa was contemporary with the morphologically very dif- 
ferent A. afarensis of eastern Africa now raises interesting questions 
about early hominid diversity and phylogenetic relationships. 

The burial age for the manuport from the Oldowan infill, calculated 
for its current burial depth of 7m and a surface erosion rate of 
5mMpyr * is 2.18 + 0.21 Myr. The Oldowan at Sterkfontein is now 
placed at a time compatible with sites elsewhere in Africa, near 2 Myr 
ago, and with the date of approximately 1.8 Myr ago at Wonderwerk”’. 
It is close to the cosmogenic burial age of 2.19 + 0.08 Myr for a manu- 
port found in the Lower Bank of Member 1 at Swartkrans’”, only about 
1 km away. Taken together, these dates show that Oldowan technology 
was present in South Africa by 2 Myr ago. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Samples of breccia were first treated with acid to dissolve carbonate cement and 
dolomite blocks. Fine quartz sand (<0.25 mm) was sieved for dating because it 
contained fewer pieces of dark chert visible by eye. Later, coarse sand and pebbles 
stained with iron oxides were picked by hand from samples ST 1, 2, 8 and 9. Light- 
coloured angular sand and pebbles were separated by hand from samples ST 1 and 
2 (Extended Data Fig. 1). The manuport (Extended Data Fig. 2) was cleaned and 
crushed to less than 0.5mm. Quartz from all samples was purified by repeated 
leaching in hot agitated 1% HF/HNO3. 

The clean quartz fractions from all samples were dissolved in 5:1 HF/HNO3 and 
spiked with °Be prepared from beryl. Upon dissolution, an aliquot was taken for 
stable Al determination. The sample was then evaporated and fumed to dryness in 
H,SO,. Be and Al were extracted by ion exchange chromatography. Both elements 
were precipitated as hydroxides and calcined at approximately 1,100 °C for 1h 
following standard procedures. 

?©A1/°7 Aland '’Be/*Be were measured by AMS at the Purdue Rare Isotope Mea- 
surement Laboratory (PRIME Lab), Purdue University. *°Al/?”Al measurements 
for samples ST 1-3 originally reported in ref. 9 were performed at Lawrence Liver- 
more National Laboratory; measurements reported here were made at PRIME Lab 
in 2014 on archived Al,O3 from the same samples. Stable Al measurements for 
samples ST 1-3 were measured by flame atomic absorption spectrophotometry; all 
others were measured by inductively coupled plasma-optical emission spectro- 
metry (ICP-OES). A conservative uncertainty of 5% was assigned to the atomic 
absorption spectrophotometry measurements, and 2% to measurements by ICP- 
OES. All?°Al measurements except three (ST 3, M2CC and Manuport) were made 
in 2014 using a gas-filled magnet. The gas-filled magnet suppresses isobaric inter- 
ference from 7°Mg and allows injection of the AIO” molecular ion into the AMS, 
resulting in 10-20 times higher beam current and improved precision. 

Because measurements were made over a period spanning more than a decade, 
there have been changes in the AMS standards that must be accounted for. Mea- 
surements reported in ref. 9 were normalized to '°Be standards prepared from a 
standard solution from the National Institute of Standards and Technology. All 
others were normalized at the time of measurement to standards prepared in ref. 30. 
All !°Be values were adjusted to match the currently accepted values of ref. 30. All 
measurements of *°Al/*’Al were normalized to standards of ref. 28. 

A derivation of the isochron dating method employed here is provided in detail 
in ref. 24. It is based on equation (1), which shows that cosmogenic ?6Al and !°Be 
concentrations depend on the decayed inherited concentrations and any accumu- 
lation after burial. 


(N26 — N26, post-buriat) / (Nio — No, post-burial) =N26, inh /Nio, inh XP(—t/Tbur) (1) 


In equation (1) N represents concentration, the numeric subscripts represent *°Al 
and '°Be, and the subscripts postburial and inh represent cosmogenic nuclide 
accumulation that postdates and predates burial. The variable ¢ represents burial 
age and Tpur is given by (1/thur = 1/t26 — 1/T1). 


LETTER 


The inherited ratio in equation (1) can be determined by assuming that the rocks 
being dated were derived from a steadily eroding landscape. In this case, the ratio 
is governed by equation (2), expressed as a function of Njo, where P represents 
the cosmogenic nuclide production rate at the sediment source area. 


Nog, inh / Nio, inh = (P26 /Pio) /[1 +Nio/(Piotbur)| (2) 


The ratio N36, post-burial/ io, post-burial Can be modelled using equation (3), assum- 
ing a constant production rate over the entire burial episode. 


Ny, post-burial /Nio, post-burial = 


(3) 
[P2st26(1 —exp(—t/t26))]/[Piot10(1 — exp(—t/T10))] 
Combining equations (1-3) leads to an expression for an isochron in which Ny¢ is 
a function of Nj and only two unknowns: t and No, post-burial 


Ny = (Nio —Nyo, post-burial) [Pas exp(—t/Thur)/(. + Nig exp(t/T10)/(Pi0 Tour) 
+ Mo, post-burial [P2626(1 — exp(—t/t26))]/[P1ot10(1 — exp(—t/T10))] 


Equation (4) can be used with a suite of samples to solve for both the burial age 
and the post-burial component of cosmogenic nuclides. We used equation (4) to 
solve for the age of the Member 2 breccia, with uncertainties determined by 
Monte Carlo analysis. The best fit age is 3.67 + 0.16 Myr, and the best fit value 
for Njo,post-burial is (21 + 3) X 10° atoms per gram. The solution is shown graph- 
ically in Fig. 3. 

For the Oldowan Infill, with only one sample, it is not possible to use an iso- 
chron. We corrected for post-burial production beneath an eroding surface follow- 
ing ref. 17. Post-burial production rates for a burial depth of 7 m and a bulk density 
of 2.0gcm 7° were calculated using a multi-exponential profile adjusted for the muon 
cross sections given in ref. 27. We calculated the burial ages three ways (Extended 
Data Table 1): a minimum age was calculated that ignored post-burial production 
completely, and would be correct if erosion rates (¢) at the site were extremely fast; 
a maximum burial age was calculated by assuming that the burial depth had not 
changed over time—that is, that erosion rate was zero; finally, an optimum age was 
calculated using a reasonable value for erosion of the ground surface, which caused 
the burial depth to change over time. We assume that the ground surface eroded at 
5m per million years, consistent with the value reported here. Cosmogenic nuclide 
production rates of 10.8 and 73.1 atoms per gram per year at the surface were 
calculated for a latitude of 26° S and an elevation of 1,500 m, with a 7°Al/!°Be pro- 
duction rate ratio of 6.8. Previous work using this method at Swartkrans nearby’” 
has yielded burial ages concordant with U/Pb ages of capping flowstones, support- 
ing its accuracy. 

Reported uncertainties are measurement errors only. We do not include uncer- 
tainties in cosmogenic nuclide production rates (which are generally minor for 
burial dating), in the 6 A1/°Be production rate ratio or in radioactive mean-lives. 
Accounting for uncertainty in the mean-lives would lead to an additional ~5% 
systematic uncertainty in the final ages, resulting in ages with total uncertainties of 
3.67 + 0.24 Myr for Member 2 and 2.18 + 0.24 Myr for the manuport. 
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Extended Data Figure 1 | Hand-picked samples. Dark-coloured and light- | M2 middle is ST 2. Light-coloured angular clasts in the top two dishes were 
coloured grains separated for samples M2 Dark and M2 Light. Each dish combined into sample M2 Light, while the iron-stained and rounded clasts in 
contains grains from the labelled original sample; M2 lower issampleST1,and the remaining dishes were combined into sample M2 Dark. 
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Extended Data Figure 2 | Manuport. Quartz manuport analysed from the 
Oldowan Infill. Maximum dimension is 67 mm. Sample recovered from Square 
Q57 spit 27’ 8'’-28' 8”". 
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Extended Data Table 1 | Burial ages for Oldowan manuport 


Minimum Maximum Optimum 
(e = fast) (e = 0) (e = 5 m/Myr) 
2.09 + 0.20 Myr 2.21 +0.21 Myr 2.18 + 0.21 Myr 
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New cosmogenic burial ages for Sterkfontein 
Member 2 Australopithecus and Member 5 Oldowan 


Darryl E. Granger', Ryan J. Gibbon’, Kathleen Kuman*"*, Ronald J. Clarke*, Laurent Bruxelles*° & Marc W. Caffee"’ 


The cave infills at Sterkfontein contain one of the richest assem- 
blages of Australopithecus fossils in the world, including the nearly 
complete skeleton StW 573 (‘Little Foot’)'“ in its lower section, as 
well as early stone tools” in higher sections. However, the chronology 
of the site remains controversial* ‘* owing to the complex history of 
cave infilling. Much of the existing chronology based on uranium- 
lead dating’®" and palaeomagnetic stratigraphy*”” has recently been 
called into question by the recognition that dated flowstones fill 
cavities formed within previously cemented breccias and therefore 
do not form a stratigraphic sequence*™. Earlier dating with cosmo- 
genic nuclides’ suffered a high degree of uncertainty and has been 
questioned on grounds of sediment reworking’*"”"’. Here we use iso- 
chron burial dating with cosmogenic aluminium-26 and beryllium-10 
to show that the breccia containing StW 573 did not undergo signi- 
ficant reworking, and that it was deposited 3.67 + 0.16 million years 
ago, far earlier than the 2.2 million year flowstones found within 
it’®". The skeleton is thus coeval with early Australopithecus afar- 
ensis in eastern Africa’*'®. We also date the earliest stone tools at 
Sterkfontein to 2.18 + 0.21 million years ago, placing them in the 
Oldowan at a time similar to that found elsewhere in South Africa 
at Swartkans’” and Wonderwerk”*. 

The cave at Sterkfontein is partly filled with overlapping layers of 
fossiliferous breccia’’”° that entered through multiple openings to the 
surface. The infill was originally divided into six members thought to 
be in stratigraphic order!’, with Members 1-3 inside the cave and 4-6 
now exposed at the surface owing to erosion of the cave roof. Although 
the complete infill stratigraphy is not exposed in any one place and the 
temporal relationship between the interior and surface deposits remains 
debated’*"*, we retain the original nomenclature’’”’ here. We will focus 
on Member 2 within the Silberberg Grotto (Fig. 1) and on the 
Oldowan Infill of Member 5 in younger deposits excavated from a 
higher infill. 

Member 2 contains abundant fossils, angular dolomite and chert 
clasts, and quartz-bearing sand. Several localized flowstones and bot- 
ryoidal calcite deposits fill cavities that formed after the breccia was 
cemented and later settled into voids dissolved below (Fig. 1)*"*. Fauna 
was accumulated as a deathtrap assemblage”' including associated ele- 
ments, largely of primates and carnivores, with no hominids apart from 
a single near-complete skeleton of Australopithecus prometheus (StW 
573; Fig. 2)'*??. This species was named on the basis of a parieto- 
occipital fossil from Makapansgat”’. It has been suggested” that sev- 
eral other Sterkfontein and some Makapansgat specimens also belong 
in this species making Australopithecus africanus and A. prometheus 
contemporaries in the assemblages of Makapansgat Member 3 and 
Sterkfontein Member 4. A. prometheus differs from A. africanus in fea- 
tures including Paranthropus-like larger, bulbous-cusped cheek teeth, 
a longer, flatter face, incipient supraglabellar hollowing and a more 
vertical rounded occiput”. (Note that we use the term hominid in the 


traditional sense to include humans and their ancestral relatives but 
exclude the great apes.) 

Dating of Member 2 and StW 573 has been problematic. Flowstones 
in the vicinity of StW 573 date to about 2.2 million years (Myr)'*", but 
they post-date the breccia and the fossil*"*. The only previous date on 
the breccia itself was cosmogenic *°Al/""Be burial dating of fine-grained 
quartz’, which yielded a best-fit age of 4.17 + 0.35 Myr. This age has 
been questioned by many’* '*** who have suggested that fine sediment 
could have been reworked from older, higher deposits within the cave, 
making the burial age of the sediment older than the fossil. To resolve 
the age of the fossil the breccia must be dated and it must be shown to 
bea coherent stratigraphic unit, largely free of reworked material. This 
is now possible owing to improvements in measurement precision and 
new techniques such as isochron burial dating which can explicitly 
validate the coeval deposition of the entire unit”*”’. 

Member 5 contains both Homo ergaster and Paranthropus fossils as 
well as Oldowan and Acheulean stone tools*’. Member 5 East is divided 
into a lower Oldowan infill, with the first appearance of stone tools and 
a few fossils of Paranthropus, and an overlying early Acheulean infill’. 
Faunal comparisons and the Paranthropus hominid StW 566 suggested 
an age estimate of 1.7—2.0 Myr for the Oldowan infill®’. A substantially 
younger age of 1.32+0.08 Myr (error-weighted mean) has been 
inferred from electron spin resonance dating of bovid teeth'’. We 
use burial dating of a quartz manuport to determine the age of the 
Oldowan infill. 

Burial dating is based on the radioactive decay of *°Al and '’Be in 
quartz. These nuclides build up by exposure to secondary cosmic radi- 
ation near the ground surface, and subsequently decay when sediment 
is buried and cosmogenic nuclide production is attenuated. Because 
2A] (to = 1.021 + 0.024 Myr (ref. 28)) decays faster than Be 
(T19 = 2.005 + 0.020 Myr (ref. 29)), the ratio 76 al/'°Be decreases over 
time, with an effective mean-life of Thur = 2.08 + 0.10 Myr. For burial 
dating to be accurate, three criteria must be met. (1) The quartz must be 
exposed near the ground surface before burial to accumulate sufficient 
?°Al and '°Be. (2) It must be buried quickly and deeply enough so that 
post-burial production is small. The exact depth required depends 
upon the inherited concentrations, but is usually many metres. (3) It 
must be buried only once in the past ~10 Myr. If quartz has been 
reworked from older deposits, or if it has been reworked underground 
within the cave system, then the burial age will overestimate the true 
age of the deposit. 

An elegant way to test whether the burial dating criteria are met is to 
construct an isochron**”’ in which multiple samples are analysed from 
the same location. Each sample is buried with its own inherited *°Al and 
‘Be concentrations, but all samples share the same post-burial produc- 
tion history. A plot of “°Al versus '"Be yields a gentle curve with a slope 
that indicates burial age and an intercept that depends on the amount of 
post-burial production™. The isochron burial dating method accounts 
for post-burial production without requiring detailed knowledge of 
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Figure 1 | Stratigraphy and sample locations. Measured stratigraphic 
section through the Member 2 talus at the location of the StW 573 skeleton 
showing locations of dated samples, modified from ref 14. Locations of U/Pb 
samples are estimated from schematic sections of refs 10, 11; palaeomagnetic 


the burial depth or burial history. It also allows outliers to be identified; 
reworked samples plot below the isochron, while samples significantly 
above the isochron are forbidden and indicate issues with either the 
sample or the laboratory measurements. 

We analysed 11 samples from Member 2 (Table 1), including three 
previously reported’. Effective isochron burial dating requires a wide 
range of inherited cosmogenic nuclide concentrations. To that end, we 
selected a suite of samples to maximize variability. Fine quartz sand from 
multiple samples (ST 1-9) was probably washed in from the surface. In 
contrast, four blocks of chert were collected from the immediate vicinity 
of StW 573 (M2CA-D). Two fractions of coarse sand and pebbles were 
separated (ST M2 Dark and Light). One fraction comprises rounded 
grains stained with pedogenic iron oxides and washed into the cave from 
soil at the surface; the second comprises angular unstained grains prob- 
ably eroded from the walls and ceiling of the cave itself (Extended Data 
Fig. 1). A previously reported sample from the modern surface’ was 
analysed to confirm that material enters the cave with a zero burial age. 

From the Oldowan Infill of Member 5 we selected a single quartz 
manuport—a typical vein quartz cobble with rounding and impact 
marks characteristic of rocks found in the local river gravels close to 
Sterkfontein (Extended Data Fig. 2). There is no evidence for reworking 
of older deposits, as there are no diagnostically younger artefacts within 
the large Oldowan assemblage of 3,500 pieces®”. 

© Aland '°Be were measured by accelerator mass spectrometry (AMS). 
All samples of fine sand and the iron-stained grains have high *°Al and 
‘Be concentrations, confirming their origin from outside the cave. Light- 
coloured grains and chert blocks have low concentrations, indicating 
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samples were located from refs 8, 12. Inset locates the cross section in the lower 
part of the Silberberg Grotto, with approximately 1 m contour intervals for the 
infill surface. 


that they were probably eroded from the walls of the cave within a few 
metres of the surface. A plot of *°Al versus !"Be (Fig. 3) reveals that all 
but two of the samples lie on an isochron, consistent with a single epi- 
sode of deposition. One chert block lies below the isochron, indicating 


Figure 2 | Skull of StW 573 (‘Little Foot’). The skull, recently extracted from 
the cave breccia. Photo by Jason Heaton. 
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Table 1 | Samples and cosmogenic nuclide concentrations 
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Sample Location [2°Be] (10° atoms per gram)* [?°Al] (10° atoms per gram) 
ST1 0.7 m below StW 573 0.493 + 0.026 0.624 + 0.053 
ST2 Adjacent to StW 573 0.574 + 0.025 0.565 + 0.050 
ST3 0.8m above StW 573 0.522 + 0.029 0.562 + 0.122 
ST7 Surface above cave 1.166 + 0.020 7.075 + 0.380 
ST8 0.7 m NW of StW 573 0.685 + 0.137 0.686 + 0.074 
ST9 2-2.5 m below StW 573 0.479 + 0.012 0.550 + 0.036 

ST M2 Dark From samples ST 1, 2, 8, 9 0.354 + 0.025 0.412 + 0.044 
ST M2 Light From samples ST 1, 2 0.118 + 0.005 0.205 + 0.015 

M2CA Near StW 573 0.101 + 0.004 0.099 + 0.009 

M2CB Near StW 573 0.070 + 0.004 0.179 +0.015 

M2CC Near StW 573 0.043 + 0.002 0.083 + 0.012 

M2CD Near StW 573 0.157 + 0.006 0.955 + 0.036 

Manuport Oldowan Infill, Member 5 1.623 + 0.070 3.051 +0.295 


*All }°Be measurements adjusted to the standard of ref. 30. ST 1-3 are slightly different than reported in ref. 9 owing to inclusion of additional analyses. 


that it was reworked from an older deposit within the cave, perhaps 
from talus of Member 1, nearby. Another chert sample has a 26 A1/)°Be 
ratio far into the forbidden zone above the isochron, indicating a prob- 
lem. Because this was a small sample there is no remaining chert for 
re-analysis; it is not included in the age determination. 

The burial age for Member 2 is calculated as 3.67 + 0.16 Myr. The 
concentration of “Be produced after burial is calculated as (21 + 3) X 10° 
atoms per gram, corresponding to a post-burial production rate of 
about 0.012 atoms of '°Be per gram per year, a value consistent with 
deep burial. The burial age of the surface sample is 0.11 + 0.11 Myr, 
consistent with zero. Its concentrations indicate a surface erosion rate 
of 5.5+0.5m Myr | for ‘°Be and 6.0 + 0.6m Myr * for *°Al. 

Several factors have contributed to lowering the age of Member 2 
from that previously reported for sample ST 2 (4.17 + 0.35 Myr)”, even 
though its ‘Be and *°Al concentrations did not change substantially. 
Since the time of the previous publication the mean-life of '°Be has been 
re-evaluated and raised from 1.93 Myr to 2.005 Myr (ref. 29), decreas- 
ing the burial age. In addition, post-burial production by muons was 
previously overestimated, making the inferred burial age too old. 
Although production rates by muons at depth have been revised’’, 
the isochron method explicitly solves for post-burial production and 
avoids the need for theoretical production rate calculations, making the 
method inherently more robust. Finally, rather than relying ona single 
sample, the new calculations consider nine samples simultaneously; 
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Figure 3 | Burial dating isochron. Cosmogenic *°Al and '°Be concentrations 
for individual samples from Member 2, shown as 1a error ellipses. The solid 
curve shows the error-weighted best fit, and dashed curves illustrate 1o 

error bounds. One sample shown as an open symbol lies below the isochron and 
has been reworked from an older deposit. A single outlier lies far above the line 
and has been excluded from analysis. The remaining nine samples are all 
consistent with a single age of deposition at 3.67 + 0.16 Myr ago. 


using revised values sample ST 2 alone would yield an age of 
3.94 + 0.20 Myr, older than but well within measurement uncertainty 
of the joint solution. 

The new age of the Member 2 breccia and the StW 573 skeleton 
encased within it is in accordance with stratigraphic and taphonomic 
data™* suggesting that they are older than Member 4 with its abundant 
Australopithecus fossils. StW 573 thus represents an earlier individual 
that is older than similar fossils from Makapansgat and contemporary 
with some A. afarensis fossils such as at Laetoli’, and a partial skeleton 
from Woranso-Mille, Ethiopia’®. The demonstration that A. prometheus 
in South Africa was contemporary with the morphologically very dif- 
ferent A. afarensis of eastern Africa now raises interesting questions 
about early hominid diversity and phylogenetic relationships. 

The burial age for the manuport from the Oldowan infill, calculated 
for its current burial depth of 7m and a surface erosion rate of 
5mMpyr * is 2.18 + 0.21 Myr. The Oldowan at Sterkfontein is now 
placed at a time compatible with sites elsewhere in Africa, near 2 Myr 
ago, and with the date of approximately 1.8 Myr ago at Wonderwerk”’. 
It is close to the cosmogenic burial age of 2.19 + 0.08 Myr for a manu- 
port found in the Lower Bank of Member 1 at Swartkrans’”, only about 
1 km away. Taken together, these dates show that Oldowan technology 
was present in South Africa by 2 Myr ago. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Samples of breccia were first treated with acid to dissolve carbonate cement and 
dolomite blocks. Fine quartz sand (<0.25 mm) was sieved for dating because it 
contained fewer pieces of dark chert visible by eye. Later, coarse sand and pebbles 
stained with iron oxides were picked by hand from samples ST 1, 2, 8 and 9. Light- 
coloured angular sand and pebbles were separated by hand from samples ST 1 and 
2 (Extended Data Fig. 1). The manuport (Extended Data Fig. 2) was cleaned and 
crushed to less than 0.5mm. Quartz from all samples was purified by repeated 
leaching in hot agitated 1% HF/HNO3. 

The clean quartz fractions from all samples were dissolved in 5:1 HF/HNO3 and 
spiked with °Be prepared from beryl. Upon dissolution, an aliquot was taken for 
stable Al determination. The sample was then evaporated and fumed to dryness in 
H,SO,. Be and Al were extracted by ion exchange chromatography. Both elements 
were precipitated as hydroxides and calcined at approximately 1,100 °C for 1h 
following standard procedures. 

?©A1/°7 Aland '’Be/*Be were measured by AMS at the Purdue Rare Isotope Mea- 
surement Laboratory (PRIME Lab), Purdue University. *°Al/?”Al measurements 
for samples ST 1-3 originally reported in ref. 9 were performed at Lawrence Liver- 
more National Laboratory; measurements reported here were made at PRIME Lab 
in 2014 on archived Al,O3 from the same samples. Stable Al measurements for 
samples ST 1-3 were measured by flame atomic absorption spectrophotometry; all 
others were measured by inductively coupled plasma-optical emission spectro- 
metry (ICP-OES). A conservative uncertainty of 5% was assigned to the atomic 
absorption spectrophotometry measurements, and 2% to measurements by ICP- 
OES. All?°Al measurements except three (ST 3, M2CC and Manuport) were made 
in 2014 using a gas-filled magnet. The gas-filled magnet suppresses isobaric inter- 
ference from 7°Mg and allows injection of the AIO” molecular ion into the AMS, 
resulting in 10-20 times higher beam current and improved precision. 

Because measurements were made over a period spanning more than a decade, 
there have been changes in the AMS standards that must be accounted for. Mea- 
surements reported in ref. 9 were normalized to '°Be standards prepared from a 
standard solution from the National Institute of Standards and Technology. All 
others were normalized at the time of measurement to standards prepared in ref. 30. 
All !°Be values were adjusted to match the currently accepted values of ref. 30. All 
measurements of *°Al/*’Al were normalized to standards of ref. 28. 

A derivation of the isochron dating method employed here is provided in detail 
in ref. 24. It is based on equation (1), which shows that cosmogenic ?6Al and !°Be 
concentrations depend on the decayed inherited concentrations and any accumu- 
lation after burial. 


(N26 — N26, post-buriat) / (Nio — No, post-burial) =N26, inh /Nio, inh XP(—t/Tbur) (1) 


In equation (1) N represents concentration, the numeric subscripts represent *°Al 
and '°Be, and the subscripts postburial and inh represent cosmogenic nuclide 
accumulation that postdates and predates burial. The variable ¢ represents burial 
age and Tpur is given by (1/thur = 1/t26 — 1/T1). 
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The inherited ratio in equation (1) can be determined by assuming that the rocks 
being dated were derived from a steadily eroding landscape. In this case, the ratio 
is governed by equation (2), expressed as a function of Njo, where P represents 
the cosmogenic nuclide production rate at the sediment source area. 


Nog, inh / Nio, inh = (P26 /Pio) /[1 +Nio/(Piotbur)| (2) 


The ratio N36, post-burial/ io, post-burial Can be modelled using equation (3), assum- 
ing a constant production rate over the entire burial episode. 


Ny, post-burial /Nio, post-burial = 


(3) 
[P2st26(1 —exp(—t/t26))]/[Piot10(1 — exp(—t/T10))] 
Combining equations (1-3) leads to an expression for an isochron in which Ny¢ is 
a function of Nj and only two unknowns: t and No, post-burial 


Ny = (Nio —Nyo, post-burial) [Pas exp(—t/Thur)/(. + Nig exp(t/T10)/(Pi0 Tour) 
+ Mo, post-burial [P2626(1 — exp(—t/t26))]/[P1ot10(1 — exp(—t/T10))] 


Equation (4) can be used with a suite of samples to solve for both the burial age 
and the post-burial component of cosmogenic nuclides. We used equation (4) to 
solve for the age of the Member 2 breccia, with uncertainties determined by 
Monte Carlo analysis. The best fit age is 3.67 + 0.16 Myr, and the best fit value 
for Njo,post-burial is (21 + 3) X 10° atoms per gram. The solution is shown graph- 
ically in Fig. 3. 

For the Oldowan Infill, with only one sample, it is not possible to use an iso- 
chron. We corrected for post-burial production beneath an eroding surface follow- 
ing ref. 17. Post-burial production rates for a burial depth of 7 m and a bulk density 
of 2.0gcm 7° were calculated using a multi-exponential profile adjusted for the muon 
cross sections given in ref. 27. We calculated the burial ages three ways (Extended 
Data Table 1): a minimum age was calculated that ignored post-burial production 
completely, and would be correct if erosion rates (¢) at the site were extremely fast; 
a maximum burial age was calculated by assuming that the burial depth had not 
changed over time—that is, that erosion rate was zero; finally, an optimum age was 
calculated using a reasonable value for erosion of the ground surface, which caused 
the burial depth to change over time. We assume that the ground surface eroded at 
5m per million years, consistent with the value reported here. Cosmogenic nuclide 
production rates of 10.8 and 73.1 atoms per gram per year at the surface were 
calculated for a latitude of 26° S and an elevation of 1,500 m, with a 7°Al/!°Be pro- 
duction rate ratio of 6.8. Previous work using this method at Swartkrans nearby’” 
has yielded burial ages concordant with U/Pb ages of capping flowstones, support- 
ing its accuracy. 

Reported uncertainties are measurement errors only. We do not include uncer- 
tainties in cosmogenic nuclide production rates (which are generally minor for 
burial dating), in the 6 A1/°Be production rate ratio or in radioactive mean-lives. 
Accounting for uncertainty in the mean-lives would lead to an additional ~5% 
systematic uncertainty in the final ages, resulting in ages with total uncertainties of 
3.67 + 0.24 Myr for Member 2 and 2.18 + 0.24 Myr for the manuport. 
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Extended Data Figure 1 | Hand-picked samples. Dark-coloured and light- | M2 middle is ST 2. Light-coloured angular clasts in the top two dishes were 
coloured grains separated for samples M2 Dark and M2 Light. Each dish combined into sample M2 Light, while the iron-stained and rounded clasts in 
contains grains from the labelled original sample; M2 lower issampleST1,and the remaining dishes were combined into sample M2 Dark. 
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Extended Data Figure 2 | Manuport. Quartz manuport analysed from the 
Oldowan Infill. Maximum dimension is 67 mm. Sample recovered from Square 
Q57 spit 27’ 8'’-28' 8”". 
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Extended Data Table 1 | Burial ages for Oldowan manuport 


Minimum Maximum Optimum 
(e = fast) (e = 0) (e = 5 m/Myr) 
2.09 + 0.20 Myr 2.21 +0.21 Myr 2.18 + 0.21 Myr 
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Niche-induced cell death and epithelial phagocytosis 
regulate hair follicle stem cell pool 


Kailin R. Mesal, Panteleimon Rompolas!, Giovanni Zito’, Peggy Myung’’, Thomas Y. Sun!, Samara Brown!, David G. Gonzalez’, 
Krastan B. Blagoev”®, Ann M. Haberman‘ & Valentina Greco’*”" 


Tissue homeostasis is achieved through a balance of cell production 
(growth) and elimination (regression)'”. In contrast to tissue growth, 
the cells and molecular signals required for tissue regression remain 
unknown. To investigate physiological tissue regression, we use the 
mouse hair follicle, which cycles stereotypically between phases of 
growth and regression while maintaining a pool of stem cells to 
perpetuate tissue regeneration’. Here we show by intravital micro- 
scopy in live mice** that the regression phase eliminates the major- 
ity of the epithelial cells by two distinct mechanisms: terminal 
differentiation of suprabasal cells and a spatial gradient of apoptosis 
of basal cells. Furthermore, we demonstrate that basal epithelial 
cells collectively act as phagocytes to clear dying epithelial neigh- 
bours. Through cellular and genetic ablation we show that epithelial 
cell death is extrinsically induced through transforming growth 
factor (TGF)-B activation and mesenchymal crosstalk. Strikingly, 
our data show that regression acts to reduce the stem cell pool, as 
inhibition of regression results in excess basal epithelial cells with 
regenerative abilities. This study identifies the cellular behaviours 
and molecular mechanisms of regression that counterbalance 
growth to maintain tissue homeostasis. 

Tissue regression in the hair follicle is thought to be mediated through 
programmed cell death’. However, it is unclear which cells within the 
follicle are removed and whether this process is a result of intrinsic 
cellular exhaustion or active elimination by extrinsic factors. We used 
our established intravital microscopy technique’ to visualize cell beha- 
viours non-invasively in live mice during hair follicle regression (Fig. 1a, 
Extended Data Fig. 1 and Supplementary Video 1). Unexpectedly, time- 
lapse recordings of epithelial nuclei (made visible using H2B-green fluo- 
rescent protein (GFP) driven by the keratin 14 promoter (K14-H2BGFP)) 
revealed a lack of cell death by nuclear fragmentation in the suprabasal 
(inner) layers. Furthermore, time-lapse recordings and genetic lineage- 
tracing approaches showed that inner layers were eliminated through 
upward terminal differentiation® (Fig. 1b, c, Extended Data Fig. 2 and 
Supplementary Video 2). 

In contrast, using live imaging we captured cell death in the basal 
epithelial layer. Furthermore, we found that apoptotic debris was retained 
within the basal epithelium and relocated around neighbouring nuclei, 
suggesting that basal epithelial cells may act as phagocytes to remove 
epithelial cellular debris during hair follicle regression (Fig. 1d and 
Supplementary Video 3). To test this, we induced mosaic expression of 
a cytoplasmic tdTomato fluorescent reporter in the basal layer. This 
showed internalization of tdTomato™ epithelial debris into neighbouring 
tdTomato basal epithelial cells (Fig. le). Ultrastructure analysis con- 
firmed phagocytosis of apoptotic bodies by basal epithelial cells (Fig. 1f 
and Extended Data Fig. 3). Tracking this process in real time with 
cytoskeletal and nuclear labelling demonstrated that apoptotic debris 
from a single cell was dispersed within the surrounding epithelium and 
collectively internalized by neighbouring basal epithelial cells (Fig. 1g 


and Supplementary Videos 4-6). Consistent with these findings, pro- 
fessional phagocytes’ were neither present inside the regressing hair 
follicles nor did they colocalize with epithelial cell debris (Extended 
Data Fig. 4 and Supplementary Videos 7, 8). Taken together, these data 
reveal two modes of epithelial cell elimination during hair follicle 
regression. While suprabasal cells undergo terminal differentiation, 
basal epithelial cells undergo apoptosis and are collectively removed 
by their basal epithelial neighbours. These findings, along with work 
done on the mammary gland'*”’, support a new paradigm of physio- 
logical epithelial self-clearance. 

Thus far, we have demonstrated that the basal epithelium adopts new 
cellular behaviours from growth to regression*”. During growth, highly 
mitotic cells fuel downwards extension of the basal epithelium. These 
basal cells, located in the lower follicle, are also more likely to be elimi- 
nated during regression, suggesting a model in which mitotic exhaus- 
tion primes cells for death’?. An alternative model could be that cell 
death is driven by extrinsic cues based on spatial location in the basal 
epithelium. To test these models, we promoted survival intrinsically in 
the basal epithelium using the Wnt/B-catenin signalling pathway, 
which is expressed in the suprabasal layers and has been impli- 
cated in survival of these cells’? (Fig. 2a and Extended Data Fig. 5). 
We used a Cre-inducible genetic model to activate B-catenin signalling 
ectopically in single cells of the basal epithelium® and track survival 
during regression in vivo (Fig. 2b). Control experiments confirmed a 
spatial bias of cell survival in the upper basal layer, as suggested by previous 
work", Although f-catenin activation was observed to enhance cell sur- 
vival throughout the follicle, the spatial bias of cell survival seen in controls 
was retained in the B-catenin-activated follicles (Fig. 2c, d). These data 
suggest that cell intrinsic factors such as Wnt/B-catenin signalling alone 
do not explain the pattern of cell survival observed and implicate extrinsic 
factors to induce cell death in the basal epithelium. 

These results prompted us to ask whether the observed pattern of 
basal cell survival was the result of spatially regulated induction of cell 
death. Quantifications of cell death events in time-lapse recordings of 
various stages of regression revealed an initial localized induction of cell 
death at the bottom of the follicle, which is in direct contact with the hair 
follicle mesenchymal dermal papilla niche (Fig. 3a and Supplementary 
Video 9). Therefore, we hypothesized that interaction with the dermal 
papilla promotes cell death along the basal epithelium of the hair follicle. 
To test this, we used two-photon laser ablation* specifically to remove 
the dermal papilla at the onset of regression and revisited the same hair 
follicles over time (Fig. 3b). Dermal papilla ablation resulted in signifi- 
cantly reduced death of basal epithelial cells as measured by hair follicle 
length when compared to neighbouring unablated hair follicles (Fig. 3c 
and Extended Data Fig. 6). Significant differences in ablated and unab- 
lated hair follicle lengths are seen as early as 2 days after ablation, sug- 
gesting that the dermal papilla directly promotes regression (Fig. 3d). 
The difference in length of ablated and unablated hair follicles could be 
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Figure 1 | Basal epithelial cells collectively act as phagocytes to clear 
neighbouring epithelial cell debris. a, Schematic of hair follicle in regression, 
indicating the basal and suprabasal (inner) epithelial layers, using K14-H2BGFP 
mice. b, Single optical sections showing upward collective movement of inner 
layers in relation to surrounding basal epithelial cells at successive time points, 
2.5 h apart (compare position of yellow and white dashed lines). c, Single-cell 
lineage tracing of inner layer cells during regression (n = 30 cells, in 4 mice). 
Labelled cells were revisited daily. Asterisk indicates mesenchymal dermal 
papilla. d, Single optical sections showing cell death (nuclear fragmentation) at 
successive time points. Note that fragments (green) relocate (white arrow) 
around neighbouring epithelial nuclei (yellow, red and blue). e, Whole-mount 
staining showing engulfment of neighbouring basal epithelial cellular content 
by phalloidin staining (blue) in with mosaic Cre induction in basal layer. 
Nucleus is indicated in green and cytoplasm in red. f, Electron micrograph 
illustrating multiple apoptotic bodies (red arrowhead) present in basal epithelial 
cells. Basal, basal epithelial cell; Der, dermis. Inset shows high-magnification 
electron micrograph depicting desmosomal junctions (arrowhead) of 
phagocytic epithelial cells. Scale bar, 500 nm. g, Single optical sections of both 
coronal and transverse planes (x,y and x,z) at successive time points 4 min apart 
showing internalization of an apoptotic body (yellow border) bya neighbouring 
basal epithelial cell. Nucleus is indicated in red and cell cortex in green. 

h, Scheme of the two modes of elimination of epithelial cells and collective 
phagocytic uptake of basal epithelial apoptotic bodies by neighbouring basal 
epithelial cells during regression. Scale bars, 20 1m unless otherwise indicated. 


attributed to a reduction in cell death or a reduction in cell clearance. 
To be able to distinguish the effect of the dermal papilla on these two 
processes, we quantified the number of apoptotic debris sites in ablated 
follicles 2 days after ablation and found that the amount of cellular debris 
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Figure 2 | B-catenin activation is not sufficient to overcome the extrinsic 
gradient of basal epithelial survival. a, Wnt/$-catenin activation is restricted 
to inner layers. Immunofluorescent staining of Lefl in regressing hair follicle. 
Lef1 is indicated in red and P-cadherin (P-cad) in green. b, Scheme of basal 
and inner layer behaviours and f-catenin activation during hair follicle 
regression. c, Lineage tracing of basal epithelial cells revisited at the beginning 
and end of regression. Representative examples of either a single control or 
B-catenin-activated cell traced during regression. d, Graphical representation 
of cell survival as a function of initial position within the regressing hair 
follicle (n = 235 or 135 in control or f-catenin, respectively, in 4 mice). 

Scale bars, 25 tm. 


was significantly reduced compared to control follicles at this initial 
time point. The debris generated from these follicles by day 2 was cleared 
by day 4, similar to control follicles, suggesting that cell clearance is 
relatively unaffected by dermal papilla removal (Fig. 3e). Collectively, 
this establishes a functional role for the mesenchymal niche to promote 
basal epithelial cell death. 

To understand the molecular signalling that facilitates basal epithelial 
cell death, we investigated the TGF-B signalling pathway, as exogenous 
administration of TGF-B1 ligand has been shown to induce precocious 
hair follicle regression’*. We found that TGF-B ligands are expressed 
by the mesenchymal dermal papilla, whereas TGF-f signalling is active 
in the basal epithelium during the regression phase (Fig. 3f-h and Ex- 
tended Data Figs 7, 8a). To test the functional role of TGF-B signalling 
in basal epithelial cell death during regression, we conditionally elimi- 
nated TGF-f receptor 1 (TGF-BR1) in the basal layer’® (Extended Data 
Fig. 8b, c). Removal of TGF-BR1 at the onset of regression resulted in 
aberrant accumulation of basal cells by the end of regression when 
compared to control littermates (Fig. 3i-k). Together, these data demon- 
strate that extrinsic regulation through TGF-B signalling and epithelial- 
mesenchymal crosstalk induces cell death along the basal epithelium 
while sparing a restricted pool of stem cells. 

This work raises the question of whether hair follicle regression serves 
to eliminate either exhausted basal cells or functional cells from an 
expanded stem cell pool. To address this question, we used an approach 
to remove the dermal papilla transiently’* during regression (Fig. 4a). 
Strikingly, as neighbouring unablated follicles began a new round of 
growth, dermal-papilla-ablated follicles that had failed to complete 
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Figure 3 | Mesenchymal dermal papilla crosstalk 
and TGF-B signalling are required for cell 
death in the basal epithelium. a, Graphical 
representation and quantification of spatial 
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distribution of cell death in the basal layer at three 
stages of regression using time-lapse recordings 
(n = 9 follicles, in 5 mice). DP, dermal papilla. 

b, Scheme of laser ablation experiment. 

c, Sequential revisits of hair follicles after dermal 
papilla ablation during regression. Yellow 
arrowhead indicates laser ablation site. Asterisk 
indicates auto-fluorescence from the two-photon 
laser. d, e, Dot plot quantification of the hair follicle 
length at day 0, 2 and 4 after dermal papilla 
ablation (d) and of the number of apoptotic 

3 fragmentation sites at day 2 and 4 after dermal 
papilla ablation (e) (n = 36 follicles, in 6 mice; 
mean + standard deviation (s.d.)). f, g, Messenger 
RNA levels of Tgfb1 ligand expression in the 
mesenchymal dermal papilla (f) and Smad7 
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regression also initiated hair growth from the bottom of their aberrantly 
long basal epithelium. Furthermore, ablated hair follicles appeared grossly 
normal, with proper generation of differentiated suprabasal layers, similar 
to neighbouring unablated hair follicles (Fig. 4b). These findings dem- 
onstrate that basal epithelial cells of the hair follicle are not intrinsically 
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quantification of the hair follicle length (j) and of 
the number of apoptotic fragmentation sites at the 
end of regression (k) (n = 31 follicles, in 4 mice; 
mean + s.d.). NS, not significant. *P < 0.05, 

**D < 0.01, ***P < 0.001 and ****P < 0.0001 
indicate a significant difference. Scale bars, 25 jim. 
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committed for cell death, but rather retain a capacity to regenerate tissue. 
This suggests that regression functions to reduce an expanded stem cell 
pool following tissue growth. 

We show that physiological regression is an extrinsically regulated 
process that reduces the size of the hair follicle stem cell compartment 
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Figure 4 | Basal epithelial cells targeted for cell death retain regenerative 
potential. a, Scheme of laser ablation experiment. b, Sequential revisits of hair 
follicles after dermal papilla (DP) ablation during the next round of growth 
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(postnatal day (P)22-P35). White arrowhead indicates differentiated inner 
layers. Observations shown represent n = 3 mice. Scale bars, 25 jum. 
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while leaving terminal differentiation programs unaffected. Regression 
is regulated through TGF-f signalling initiated by the mesenchymal 
niche to induce spatially restricted cell death in the basal epithelium. 
Clearance of apoptotic cells is a self-contained process driven by epithe- 
lial phagocytosis within the regressing basal epithelium. Finally, inhibi- 
tion of regression through transient loss of the mesenchymal 
niche demonstrates that cells throughout the hair follicle basal epithe- 
lium maintain regenerative competency when in proximity to the 
mesenchymal niche (Extended Data Fig. 9). All together, we dem- 
onstrate that tissue regression relies on spatially coordinated cellular 
behaviours, and establish a new understanding of the extrinsic 
regulation that counterbalances tissue growth over the lifespan of an 
organism. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Mice. K14-H2BGFP”, Lef1-RFP'* and K14-GFPActin’? were obtained from the Fuchs 
Laboratory. Tefbr !™" mice were obtained from V. Kaartinen'’. Cinnb"™"* mice 
were obtained from M. Taketo”®. Lgr5-CreER (Clevers Laboratory), Shh-CreER (Tabin 
Laboratory), LysM-Cre (Foerster Laboratory), Cx3cr1-GFP (Littman Laboratory) 
and Rosa-stop-tdTomato (Zeng Laboratory) were obtained from Jackson Labora- 
tory (JAX)! **. The Yale Transgenic Facility generated the K14-H2BmCherry mice. 
All studies and procedures involving animal subjects were approved by the Insti- 
tutional Animal Care and Use Committee at Yale School of Medicine and con- 
ducted in accordance with the approved animal handling protocol. Lgr5-CreER 
and Shh-CreER were used to recombine alleles and label cells conditionally within 
specific hair follicle populations and temporally during the regression phase. Cre 
induction for the lineage-tracing experiments was induced with a single intraper- 
itoneal injection of tamoxifen (1 1g g~’ in corn oil) at postnatal day 14. Tgfbr 
recombination was induced with three intraperitoneal injections of tamoxifen 
(100 pg g~' in corn oil) at postnatal day 10, 12 and 14. Intravital microscopy and 
laser ablation procedures were carried out as described previously*”. For lineage- 
tracing experiments, only cells that were unambiguously separated from neigh- 
bouring cells were sampled to ensure the identity of individual lineages. Mice from 
experimental and control groups were randomly selected of either gender for live 
imaging experiments. No blinding was done. All lineage-tracing and ablation exper- 
iments were repeated in at least three different mice. 

Generation of K14H2BmCherry mice. Transgenic mice expressing H2BmCherry 
under the control of the keratin 14 promoter (K14-H2BmCherry) were generated 
using the following procedure. The H2BmCherry insert (provided by D. Egli) was 
amplified by PCR from the TopoTA vector (Life Technologies) using primers 5’- 
CGGCGGATCCATGCCAGAGCCAGC and 3'-CGCTCTAGATTACTTGTA 
CAGCTCGTCC, which introduced cleavage sites for BamHI and Xbal restriction 
enzymes immediately upstream and downstream, respectively, of the open read- 
ing frame. The 1.1 kb PCR product was inserted between the BamHI and Xbal 
sites in the pG3Z*K14cassette vector (provided by E. Fuchs). The resulting trans- 
gene was digested with Sacl and Sphl, and the 4.3 kb fragment was injected into 
blastocysts at the Yale Transgenic Facility (T. Nottoli). Chimaeric mice were screened 
initially by PCR and founder mice were selected to establish transgenic mouse 
lines. These initial lines were subsequently screened by histological analysis, and 
the line displaying the highest expression levels of the K14H2BmCherry reporter 
was selected to establish the final colony. 

In vivo imaging and laser ablation. Mice between postnatal day 17 and 35 were 
anaesthetized with intraperitoneal injection of 7 lg’ of ketamine/xylazine cock- 
tail mix (15 mg ml ‘and1 mg ml ~ x respectively, in PBS). Anaesthesia was main- 
tained throughout the course of the experiment with vaporized isofluorane delivered 
bya nose cone as previously described’’. Image stacks were acquired with a LaVision 
TriM Scope II (LaVision Biotec) microscope equipped with a tunable Chameleon 
Ultra (Coherent) Ti:Sapphire laser. To acquire serial optical sections, a laser beam 
(740 nm for Alexafluor 350; 940 nm for H2BGFP; 1,040 nm for RFP and tdTomato; 
990 nm for simultaneous excitation of GFPActin and H2BmCherry) was focused 
through a X20 or X40 water immersion lens (N.A. 1.0 and 1.1 respectively; Zeiss) 
and scanned witha field of view of 0.5 or 0.25 mm’, respectively, at 600 Hz. Z-stacks 
were acquired in 1-3 jm steps to image a total depth of 150 1m of tissue. We revis- 
ited the same hair follicles in separate experiments as previously described'®. For 
time-lapse recordings, serial optical sections were obtained between 1 to 5 min 
intervals, depending on the experimental setup. Laser ablation was carried out with 
the same optics as used for acquisition. An 800 nm laser beam was used to scan 
the target area (10-50 fm”) and ablation was achieved using 30-50% laser power 
for ~1s. Ablation parameters were adjusted according to the depth of the target 
(50-100 jm). 

Image analysis. Raw image stacks were imported into Fiji (NIH) or Imaris soft- 
ware (Bitplane/Perkin Elmer) for further analysis. Provided images and Supplemen- 
tary Videos are typically presented as a maximal projection of 3-6 1m optical 
sections. For visualizing individual labelled cells expressing the td Tomato Cre reporter, 
the brightness and contrast were adjusted accordingly for the green (GFP) and red 
(REP/tdTomato) channels and composite serial image sequences were assembled 
as previously described. Hair follicle length and labelled cell position values were 
measured from the top of the stem cell compartment. Apoptotic cell tracking anal- 
ysis was performed in Imaris software (Bitplane). 

Electron microscopy. Trimmed skin samples were fixed (2% gluteraldehyde 
and 2% paraformaldehyde in 0.1 M sodium cacodylate buffer pH 7.4) for 1h. 
The samples were rinsed in sodium cacodylate buffer and were post-fixed in 
1% osmium tetroxide for 1h. The samples were rinsed and en bloc stained in 
aqueous 2% uranyl acetate for an hour further, followed by rinsing, dehyd- 
rating in an ethanol series to 100%, and rinsing several times in 100% propy- 
lene. Then samples were infiltrated with Embed 812 (Electron Microscopy 
Sciences) resin and baked overnight at 60 °C. Hardened blocks were cut using 


a Leica UltraCut UC7. Sixty-nanometre sections were collected and stained 
using 2% uranyl acetate and lead citrate for transmission microscopy, and 
250-nm-thick sections were stained with either Richardson’s stain or 1% 
Toluidine Blue for light microscopy. For immunolabelled electron micro- 
scopy, dissected skin samples were fixed in 4% paraformaldehyde/0.1% glu- 
teraldehyde in phosphate buffer for 30 min and then in 4% paraformaldehyde/ 
phosphate buffer overnight at 4°C. The samples were rinsed in 0.1 M HEPES. 
To quench, aldehydes were placed in 50 mM NH, CI plus 100 mM glycine plus 
2% sucrose for 1h, then washed in HEPES buffer and placed in 0.1% tannic 
acid/0.1 M HEPES for 1h, then rinsed in 50mM Tris/50mM maleate and 
placed in 2% uric acid/50 mM Tris/50 mM maleate for 1h. After rinsing, they 
were dehydrated through a graded series 50% to 95% of ethanol at 4 °C, then 
infiltrated with 50:50 ethanol/LR White (EMS) for 1h followed by several 
changes of pure 100% LR White overnight on a rotator at 4 °C. Samples were 
polymerized at 60°C for 18h. Fifty-nanometre resin sections were cut on a 
Leica UC7 ultra-microtome and collected on nickel formvar/carbon grids, and 
immunolabelled using a primary chicken anti-GFP (Abcam) diluted to 1:50 
for 1h, rinsed and placed on protein A gold secondary 1:50 (University of 
Utrecht). The sections were counterstained with 2% uranyl acetate and lead 
citrate. Grids were viewed FEI Tencai Biotwin TEM at 80kV. Images were 
taken using Morada CCD and iTEM (Olympus) software. 

Immunostaining on paraffin sections and whole-mount skin. Skin was fixed in 
4% PFA for whole mount or in 10% formalin for paraffin embedding and used 
for histological analysis as previously described”*. Immunohistochemistry was 
performed by incubating sections at 4 °C overnight with primary antibodies as 
follows: mouse anti-B-catenin (1:100, BD #610153; 14/Beta-Catenin), rat anti- 
CD11b (1:250, eBioscience #14-0112; M1/70), goat anti-P-cadherin (1:100, 
R&D #AF761), rabbit anti-pSmad2 (Ser465/467) (1:1,000, Cell Signaling #3108; 
138D4), and rabbit anti-Lefl (1:100, Cell Signaling #2286; C18A7). pSmad2 
immunostaining required TSA Plus kit (PerkinElmer). For bright-field immuno- 
histochemistry, biotinylated species-specific secondary antibodies, followed by 
detection using the ABC kit (Vector Labs) and DAB kit (Vector Labs), were used 
according to the manufacturer’s instructions. M.O.M. kit was used for mouse 
antibodies (Vector Laboratories). Secondary antibodies conjugated with FITC, 
RRX and Cy5 (Jackson Immunoresearch Laboratories) were used at a concentra- 
tion of 1:100 for 1h at room temperature. Alexafluor 350 phalloidin (Life 
Technologies) was used according to the manufacturer’s instructions. 

FACS. Back skins of K14-H2BGFP; Lef1-REP and Lgr5-CreER; Tgfor™ or Tefbr!’*; 
tdTomato; K14-H2BGFP mice were harvested at P12, P16 or P20 and were placed 
dermis down on 0.2% collagenase (Sigma) at 37 °C for 20 min, and then placed on 
0.25% trypsin (Gibco) at 37 °C for 10 min to obtain epithelial cells as previously 
described”’. Cells were stained for 10 min with biotinylated rat anti-CD34 (1:50, 
eBiosciences #14-0341; RAM34), biotinylated rat anti-CD45 (1:50, BD #553077; 
30-F11), biotinylated rat anti-CD117 (1:50, BD #553353; 2B8) and goat anti- 
integrin-9 (1:50, R&D #AF3827). Cells were washed for 5 min and then incu- 
bated with streptavidin-Pacific blue (1:200, Invitrogen) and Alexafluor 647 donkey 
anti-goat IgG (Jackson Immunoresearch Laboratories). Cells were isolated on DAPI 
exclusion and by the following criteria: dermal papilla = REP*, CD34, CD45, 
CD117 , integrin-0.9"; and enriched outer root sheath = RFP, GFPM# using a 
FACSAria II Cell Sorter (BioScience), as previously described”*. Cells were sorted into 
RNA lysis buffer for RNA isolation (RNease Mini Kit, Qiagen). FACS profiles were 
analysed through FlowJo software. 

RT-qPCR. cDNA was made using Superscript III First-Strand Synthesis kit (Invi- 
trogen). RT-qPCR was performed in triplicate with SYBER Green I reagents (Invi- 
trogen) using 5.0 ng cDNA per reaction on the ViiATM 7 Real-Time PCR system 
(Invitrogen, Life Technologies). Data were analysed by ViiATM software, Microsoft 
Excel and PRISM. Gene-specific primers were designed and are listed in Sup- 
plementary Table 1. 

Statistical analysis. Data are expressed as percentages, box and whisker plots (error 
bars represent maximum and minimum), or mean + s.d. An unpaired Student’s 
t-test was used to analyse data sets with two groups and *P < 0.05 to ****P < 0.0001 
indicated a significant difference. Statistical calculations were performed using the 
Prism software package (GraphPad). No statistical method was used to predeter- 
mine sample size. 
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Hair follicle regeneration cycle 
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Extended Data Figure 1 | Hair follicle regeneration cycle. Hair follicle 
growth (Anagen) is characterized by downward expansion and generation of 
several epithelial layers. The most external layer, the outer root sheath (ORS), 
consists of relatively undifferentiated basal epithelial cells. Inner layers are 
generated by a committed progenitor pool, the matrix, which gives rise to 


(Matrix, IRS, and hair shaft) 


several differentiated layers including: companion layer, inner root sheath (IRS) 
and hair shaft. After growth, the majority of the newly formed layers are lost 
during the regression phase (Catagen), leaving a small surviving fraction of 
cells that reconstitute a new stem cell/progeny (Bulge/Hair Germ) 
compartment at the rest phase (Telogen). 
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Extended Data Figure 2 | Hair follicle inner layers resist apoptosis and 
continue upward terminal differentiation. a, Upward movement of hair 
follicle inner layers during growth and regression. Single optical sections show 
upward collective movement of inner layers relative to surrounding basal 
cells at two time points 130 min apart. Compare the position of labelled cells 
and dashed line of basal (red) to inner layers (yellow). b, Upward movement of 
hair shaft during regression. Optical sections of top view (epidermis) and 
side view (hair follicle) at two time points, 1 day apart. Note the extrusion of 
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hair shafts from regressing hair follicles. Observations shown represent n = 3 
mice. c, Companion layer lineage tracing during regression. Representative 
example of matrix progenitors of the companion layer traced during regression 
in Lgr5-CreER;tdTomato;K14-H2BGFP mice (n = 20 or 7 lineages, in 4 mice). 
d, Terminal differentiation of inner layer progenitor cells. Representative 
example of single-cell lineages (n = 35 or 9 lineages, in 3 mice) traced 

during the initial transition of hair follicle growth to regression in 
Shh-CreER;tdTomato;K14-H2BGFP mice. Scale bars, 25 tm. 
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Extended Data Figure 3 | Apoptotic bodies are cleared by neighbouring restriction of apoptotic bodies (red arrowheads) and phagocytic activity to the 


basal epithelial cells. a, Toluidine-blue-stained section of regressing hair basal epithelium. e, High-magnification electron micrograph with immune- 
follicles used for ultrastructure analysis. b, Electron micrograph illustrating gold labelling for GFP protein expressed by K14-H2BGFP" cells. Positive GEP 
multiple apoptotic bodies (red arrowheads) present in hair follicle basal labelling is present in both apoptotic bodies (Ap) and phagocytic basal 


epithelium, but absent in inner layers. c, Electron micrograph showing a hair _ epithelial nuclei (n). Observations shown represent n = 2 mice. Scale bars, 
follicle in regression (white dashed line). d, Electron micrograph showing the 25 im unless otherwise indicated. 
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Extended Data Figure 4 | Professional phagocytes are not present in LysM-Cre;tdTomato;K14-H2BGFP mice. b, Immunofluorescent staining of 
regressing hair follicles. a, Professional phagocytes do not enter regressing myeloid populations in skin during hair follicle regression. DAPI, blue; CD11b, 
hair follicles. Single optical sections showing absence of myeloid populations _ red; P-cadherin, green. Observations shown represent n = 4 mice. Scale bar, 
inside the hair follicle 2.5h after epithelial cell death (arrowhead) in 25 um. 
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Extended Data Figure 5 | Wnt/B-catenin activity is restricted to the 
inner layers during regression. a, b, Immunohistochemistry (a) and 
immunofluorescent (b) staining highlighting active (nuclear) B-catenin of 
hair follicle inner layers (dashed line) at the onset of regression. 


c, Immunofluorescent staining of the Wnt/-catenin target gene, Lef1, during 
hair follicle regression. DAPI, blue; Lef1, red; P-cadherin, green. Asterisk 
indicates mesenchymal dermal papilla. Observations shown represent n = 2 
mice. Scale bars, 50 um. 
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Extended Data Figure 6 | Late and partial mesenchymal dermal papilla 
removal does not affect hair follicle regression. a, Sequential revisits of hair 
follicles after dermal papilla (DP) ablation during late regression. b, Box plot 
quantification of hair follicle length immediately after ablation, 4 days and 

11 days after dermal papilla ablation (n = 20 follicles, in 4 mice; error bars 
represent maximum and minimum). c, Sequential revisits of hair follicles 
after partial dermal papilla ablation during early regression (n = 12 follicles, 
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in 3 mice). d, Schematic illustration of the results from mesenchymal dermal 
papilla ablation experiments. Dermal papilla ablation during early regression 
results in failed elimination of the basal epithelium, while the inner layers 
continue upward in terminal differentiation, yet dermal papilla ablation during 
late regression does not impair hair follicle regression. Asterisk indicates 
auto-fluorescence from the two-photon laser. NS, not significant; P< 0.05, 
mean + s.d. Scale bars, 25 tm. 
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b FACS scheme for DP and basal epitheial cell isolation 
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Extended Data Figure 7 | Characterization of TGF-B pathway in Sox2 expression. d, Validation of basal-epithelial-sorted population 
mesenchymal dermal papilla and basal epithelial cell populations during enrichment by keratin 14 (K14) expression. e, TGF-f ligand 2 and 3 expression 
regression. a, Schematic of skin digestion and cell isolation with representative in the mesenchymal dermal papilla throughout the hair cycle. TGF-f1 


images before and after tissue digestion in K14-H2BGFP;Lefl-RFP mice. expression in basal epithelium, mesenchymal dermal papilla, and all sorted cells 
b, Representative fluorescent-activated cell sorting (FACS) scheme for isolating during regression. f, Differential expression of TGF-B target genes: Smad7, 
mesenchymal dermal papilla (DP; RFP‘, CD34, CD45, CD117 , Tmeff1, p15INK4B (also known as Cdkn2b) and in the hair follicle basal 


integrin-0.9~) and enriched hair follicle basal epithelium (RFP, GFP") cells. epithelium throughout the hair cycle (mean + s.d.; n = 3 technical replicates). 
c, Validation of mesenchymal dermal-papilla-sorted population enrichment by Scale bars, 100 um. *P < 0.05, **P < 0.01, ***P < 0.001. 
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Extended Data Figure 8 | Local TGF-f activation during regression and scheme for isolating tdTomato-Cre-reporter-positive basal epithelial cells 
validation of Cre-induced loss of TGF-BR1 expression. a, TGF-B activation (tdTomato*, GEP™8) from Tefbr and Tofbr * mice. c, Expression 
shown by immunofluorescent staining of pSmad2 during the transition of TGF-BRI and the TGF-B target gene, Smad7, in Cre-recombined basal 
from hair follicle growth to regression. DAPI, blue; pSmad2, red; P-cadherin, _ epithelial cells from Tefbre” “fl and Tofbr ” * mice (P< 0.01, mean + s.d.3n = 3 
green. Observations shown represent n = 4 mice. b, Representative FACS technical replicates). Scale bar, 50 tum. 
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Extended Data Figure 9 | Extrinsic induction of hair follicle regression 
dictates the regenerative (stem cell) pool. Crosstalk with the mesenchymal 
niche during regression results in localized TGF-f activation, promoting a 
spatially restricted gradient of cell death in the basal epithelium. Clearance of 


Inhibition of regression 


apoptotic cells by neighbouring basal epithelial cells results in a limited pool of 
surviving stem cells. Inhibition of this regression process results in excessive 
amounts of basal epithelial cells capable of fuelling a new round of growth when 
in contact with the mesenchymal dermal papilla. 
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Disruption of DNA-methylation-dependent long 
gene repression in Rett syndrome 


Harrison W. Gabel'*, Benyam Kinde, Hume Stroud!, Caitlin S. Gilbert’, David A. Harmin!, Nathaniel R. Kastan’, 


Martin Hemberg’+, Daniel H. Ebert’ & Michael E. Greenberg! 


Disruption of the MECP2 gene leads to Rett syndrome (RTT), a severe 
neurological disorder with features of autism’. MECP2 encodes a 
methyl-DNA-binding protein’ that has been proposed to function 
as a transcriptional repressor, but despite numerous mouse studies 
examining neuronal gene expression in Mecp2 mutants, no clear 
model has emerged for how MeCP2 protein regulates transcription’. 
Here we identify a genome-wide length-dependent increase in gene 
expression in MeCP2 mutant mouse models and human RTT brains. 
We present evidence that MeCP2 represses gene expression by bind- 
ing to methylated CA sites within long genes, and that in neurons 
lacking MeCP2, decreasing the expression of long genes attenuates 
RTT-associated cellular deficits. In addition, we find that long genes 
as a population are enriched for neuronal functions and selectively 
expressed in the brain. These findings suggest that mutations in 
MeCP2 may cause neurological dysfunction by specifically disrupt- 
ing long gene expression in the brain. 

To identify common features of genes whose expression is misregu- 
lated in RTT, we surveyed gene expression data sets from studies of Mecp2 
mutant mice, asking if genes that are misregulated when MeCP2 func- 
tion is disrupted have anything in common with respect to histone 
modifications, mRNA expression, sequence composition or gene length. 
No common features were identified for genes that are downregulated 
when MeCP2 function is disrupted; however, we found that genes that 
are upregulated in the Mecp2 knockout brains are significantly longer 
than the genome-wide average (Fig. 1a). The extreme length of the genes 
upregulated in MeCP2 knockout brains is apparent in multiple studies 
performed by different laboratories’ ° (Supplementary Table 1). The 
misexpression of long genes is a specific feature of the RTT brain, as 
gene sets identified as misregulated in 16 different mouse models of neu- 
rological dysfunction and disease did not display similarly long length 
(Extended Data Fig. 1). 

To determine whether the extent of gene misregulation in Mecp2 mu- 
tant mice is directly correlated with gene length, we interrogated published 
microarray data sets of gene expression and plotted mRNA fold-change 
(MeCP2 knockout compared to wild type) versus gene length’®. We 
found widespread length-dependent misregulation of gene expression 
in MeCP2 knockout brains, with the longest genes in the genome dis- 
playing the highest level of upregulation relative to shorter genes, which 
show a reduction or no change in gene expression (Fig. 1b, c and Ex- 
tended Data Fig. 1). Consistent with previous studies, the magnitude of 
the length-dependent gene misregulation in the absence of MeCP2 is 
small, but widespread (affecting genes across the continuum of gene 
lengths) and reproducibly detected (Fig. 1b and Extended Data Fig. 1). 
Importantly, length-dependent gene misregulation in the MeCP2 knock- 
out is not an artefact of the method of gene expression analysis used, as 
this effect was detected using a variety of methodologies including micro- 
arrays, total RNA-seq, quantitative PCR, and non-amplification-based 
nCounter analysis (Fig. 1b, cand Extended Data Fig. 1 and Supplementary 


Discussion). Furthermore, these observations are corroborated by the 
recent finding’ that long genes are upregulated in specific neuronal 
cell types when MeCP2 function is disrupted. 

Additional copies of MECP2 cause neurological impairment in humans 
(MeCP2-duplication syndrome) and in transgenic mice’*”’. We find 
that overexpression of MeCP2 in mice leads to the downregulation of 
long genes in the brain*”’ (Fig. 1b and Extended Data Fig. 1). This fur- 
ther suggests that MeCP2 directly represses transcription in a length- 
dependent manner. 

We next investigated if the length-dependent changes in gene expres- 
sion correlate with the onset and severity of RTT pathology. We found 
that misregulation of long gene expression in the brain of MeCP2 
knockout mice is more striking at nine weeks of age than at four weeks 
of age®, thus correlating with disease progression (Extended Data Fig. 2). 
In addition, when comparing two disease-causing MeCP2 mutations 
(MeCP2(R270X) and MeCP2(G273X)) that differ in the rate and severity 
with which they cause disease, we find that the magnitude of length- 
dependent gene misregulation correlates with the severity of RTT 
phenotypes* (Extended Data Fig. 2 and Supplementary Discussion). 
Furthermore, we find by microarray, nCounter and qRT-PCR analysis 
that a subtle missense mutation of MeCP2 (Arg 306 to Cys, R306C) that 
causes RTT in humans and disrupts the interaction of MeCP2 with the 
NCOoR co-repressor complex“ leads to length-dependent gene upregu- 
lation in the mouse brain (Extended Data Fig. 1). Finally, we detect 
length-dependent gene upregulation in cultured human neurons derived 
from embryonic stem cells lacking MECP2 (ref. 15) and the cortex of 
humans with RTT” (Fig. 1d and Extended Data Fig. 2 and Supplemen- 
tary Discussion). The close correlation between the occurrence of length- 
dependent gene misregulation and RTT-associated phenotypes across 
mice and humans suggests that this misregulation contributes to RTT 
pathology. 

To characterize the mechanism by which MeCP2 tempers the expres- 
sion of long genes, we asked if the binding of MeCP2 to methylated 
DNA is important for this process. MeCP2 was identified based on its 
high affinity for methylated cytosine in the context of a CpG dinucleo- 
tide (mCG)”. In addition to binding mCG, MeCP2 has been suggested 
to bind two additional forms of methylated DNA that are enriched in 
the brain, hydroxymethylcytosine (hmC)"* and methylated cytosine 
followed by a nucleotide other than guanine (mCH, where H = A or T 
or C)’’. Notably, the frequency of hmCG and mCH in the neuronal 
genome increases significantly during the same postnatal period in 
which the level of MeCP2 protein markedly increases”. This suggests 
that as neurons mature, MeCP2 could function by binding to hmCG 
and/or mCH marks. Using a DNA electrophoretic mobility shift assay 
(EMSA) we assessed the binding of MeCP2 to various forms of methy- 
lated DNA. Consistent with previous studies, we find that MeCP2 
shows high affinity for DNA containing mCG but not hmCG, suggesting 
that MeCP2 may not bind preferentially to hmCG in neurons (Fig. 2a, 
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Figure 1 | Length-dependent gene misregulation in Mecp2 mutant mice 
and human RTT brain. a, Boxplots (showing the median (line), second to 
third quartiles (box), 1.5% the interquartile range (whiskers), and 1.58 

the interquartile range/(, number of genes)) of gene lengths (RefSeq 
transcription start site to termination site) for genes detected as misregulated in 
independent studies of Mecp2 mutant mice. HYP, hypothalamus’; CB, 
cerebellum®; AMG, amygdala’; HC, hippocampus’; STR, striatum’; LVR, liver’. 
MeCP2-induced (blue), genes downregulated in MeCP2 knockout (MeCP2 
KO) and upregulated in MeCP2 overexpression (MeCP2 OE) mice. MeCP2- 
repressed (red), genes upregulated in MeCP2 KO and downregulated in 
MeCP2 OE (see Methods). b, Mean expression changes across brain regions 


Extended Data Fig. 3 and Supplementary Discussion). By contrast, 
MeCP2 binds to mCA, hmCA and mCG with relatively high affinity, 
but binds to mCC and mCT with low affinity similar to that of unmeth- 
ylated DNA. This selective, tight binding of MeCP2 to mCG, mCA and 
hmCA suggests that MeCP2 may regulate long gene expression in the 
brain by binding to these sites. We note that thin-layer chromatography 
and Tet-assisted bisulfite sequencing (TAB-seq) analysis suggest that 
hmCA is very rare in the brain’’™*. Therefore, in our subsequent invest- 
igation of MeCP2 binding to CA sequences in vivo we focused our 
analysis on mCA. However, at genomic sites where CA sequences are 
hydroxymethylated, MeCP2 might also be predicted to bind and regu- 
late gene expression (see Supplementary Discussion). 

To examine whether MeCP2 binds mCA in the brain, we performed 
chromatin immunoprecipitation sequencing analysis (ChIP-seq) of 
MeCP2, comparing the MeCP2 binding profile across the genome to 
base-pair resolution DNA methylation data (see Methods)**. As prev- 
iously reported”®”*, we find that MeCP2 binds broadly across the gen- 
ome. Nevertheless, within the context of this broad binding, we detect a 
relative enrichment of MeCP2 at gene bodies that have a high level of 
mCA (level = (h)mCN/CN within the gene, see Methods), and a de- 
pletion of MeCP2 binding at gene bodies where the level of hmCG is 
high (Extended Data Fig. 4). Notably, long genes (> 100 kb) display a 
strong relationship between mCA levels and MeCP2 ChIP-seq read 
density (Fig. 2b and Extended Data Fig. 4). Higher-resolution analysis of 
MeCP2 ChIP and mCA levels in the frontal cortex revealed increased 
mCA under sites of local MeCP2 enrichment in the genome, support- 
ing the conclusion that MeCP2 binds to mCA in vivo (Extended Data 
Fig. 4). We note that genes containing the highest level of hmCA are also 
enriched for the MeCP2 ChIP signal (Extended Data Fig. 4). Therefore, 


90 | NATURE | VOL 522 | 4 JUNE 2015 


b 
0.08 so Genes < 100 kb 
as ee 7 ee li Genes > 100 kb 
2» 0.044 
52 ee 
geo r NS ce 
dee so] seat 
es 0 il ; F il “| ial 
8s ; . | 
= 
& E 0.04 
= 
00'g mo of ao Mm © 
= Oo = Io, 3 z= 0 2 
MeCP2 KO MeCP2 OE 
d 0.104 
oO 
25 0.05 4 
Co +. 
x ie 
° 8 
SE 0.001 
a © 
8 § 
< € -0054 
oO = 
= 
-0.10 4 
T T T T 
1 10 100 1,000 


Gene length (kb) 


and liver of Mecp2 mutant mice for genes = 100 kb (grey) and >100kb 

(red) (see Methods and Supplementary Table 1 for sample sizes and other 
details). c, d, Genome-wide changes in gene expression assessed by RNA-seq 
analysis of mouse cortical tissue from MeCP2 KO (n = 3) compared to wild 
type (n = 3) (c) or microarray analysis of human RTT brain samples (n = 3) 
compared to age-matched controls (n = 3)'* (d). In ¢, d lines represent mean 
fold-change in expression for genes binned according to gene length (200 
gene bins, 40 gene step; see Methods); the ribbon is the s.e.m. of each bin. 
*P<0.05; **P<0.01; ***P<1X 10", NS, not significant P= 0.05; 
one-sample (a) or two sample (b) t-test, Bonferroni correction. Error bars 
represent s.e.m. 


if owing to limitations of the methods of analysis the amount of hmCA 
within gene bodies is being underestimated, some of the effects of MeCP2 
deletion that are being attributed to MeCP2 binding to mCA might be 
due to MeCP2 binding to hmCA (see Supplementary Discussion). 
To investigate if length-dependent gene repression by MeCP2 requires 
binding to mCA, we assessed whether there is a correlation between 
the degree of misregulation of gene expression upon the disruption of 
MeCP2 function and the levels of DNA methylation within the tran- 
scribed regions of genes (see Supplementary Discussion). We noted a 
trend whereby genes containing high levels of mCA, but not mCG or 
hmCG, are upregulated in the MeCP2 knockout (Extended Data Figs 5 
and 6). We reasoned that if mCA within genes is required for length- 
dependent repression by MeCP2, long genes containing low levels of 
mCA should be largely unaffected in the MeCP2 knockout mice. Con- 
sistent with this prediction, little to no length-dependent upregulation 
of gene expression is observed in MeCP2 knockout brain for genes 
containing low levels of mCA, while long genes with a high density of 
mCA are significantly upregulated in MeCP2 knockout brains. In addi- 
tion, we found that the shortest genes in the genome are not upregulated 
when MeCP2 function is disrupted, even when the average level of 
mCA within their gene body is relatively high (Fig. 2c and Extended 
Data Fig. 6). The requirement for the presence of mCA within long 
genes for the gene to be repressed by MeCP2 is reproducible, as it is 
detected across three MeCP2 knockout brain regions, in gene expres- 
sion data from MeCP2(R306C) and MeCP2 overexpressing mice, and 
in human RTT brain (Fig. 2d and Extended Data Fig. 6). Notably, 
when we plotted the level of mCA versus gene length, we found that 
the density of mCA is higher on average in longer genes compared to 
shorter genes (Extended Data Figs 5 and 6). The enrichment of mCA 
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Figure 2 | MeCP2 represses long genes containing high levels of mCA. 

a, EMSA analysis of the MeCP2 methyl-binding domain (amino acids 78-162) 
binding to **P-end-labelled mCA-containing DNA probe incubated with 
100-fold excess of unlabelled competitor oligonucleotides containing 
unmodified, methylated, or hydroxymethylated cytosines at the dinucleotides 
indicated in bold; no competitor indicated by — symbol (see Methods and 
Extended Data Fig. 3). b, Boxplots of MeCP2 ChIP-seq read density within 
genes > 100kb plotted by quartile of mCA/CA in the cortex and cerebellum. 
c, Mean fold-change in gene expression binned according to gene length in 
MeCP2 knockout cortical tissue for genes with high (mCA/CA >0.034, 

top 25%) and low (mCA/CA <0.031, bottom 66%) mCA levels (left), or binned 
according to gene-body mCA/CA levels for long (> 62 kb, top 25%) and short 


within long genes may explain why most of these genes are repressed 
by MeCP2 and upregulated in the MeCP2 knockout. 

To test further whether MeCP2 tempers long gene transcription by 
binding to mCA within genes, we asked if elimination of mCA in the 
brain has an effect on gene expression that is similar to that observed in 
the MeCP2 knockout. Recent evidence suggests that Dnmt3a is the en- 
zyme that catalyses the deposition of mCA in maturing neurons’”™. 
We therefore conditionally disrupted the Dnmt3a gene”® in the brain 
to block the accumulation of mCA (Nestin-Cre; Dnmt3a"/4 mice, desig- 
nated Dnmt3a cKO, Extended Data Fig. 7 and Supplementary Discu- 
ssion). Bisulfite sequencing of cerebellum DNA indicated that methylation 
of DNA at CA, but not CG, is eliminated from the genome in the Dnmt3a 
conditional knockout (Fig. 3a). Microarray analysis of cerebella from 
Dnmt3a conditional knockout mice revealed a length- and mCA- 
dependent upregulation of gene expression that is similar to the gene 
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(< 16.8 kb, shortest 25%) genes (right). Lines represent mean fold-change in 
expression for each bin (200 gene bins, 40 gene step), and the ribbon is s.e.m. 
of each bin. n = 3 per genotype. d, Bar plots of the mean fold-change in 
expression for all genes > 100 kb compared to subsets of genes > 100 kb 
containing low mCA (bottom 50% mCA/CA) or high mCA (top 25% 
mCA/CA) within their gene body. Values shown for mice with the indicated 
Mecp2 genotypes (left) and human RTT brain (right). CTX, Cortex; HC, 
Hippocampus; CB, cerebellum; KO, MeCP2 knockout; OE, MeCP2 
overexpression; R306C, MeCP2 arginine 306 to cysteine missense mutation; 
P< 1X 101% **P <1 X 10°; *P < 0.01; two-tailed ¢-test, Bonferroni 
correction. Error bars represent s.e.m. See Supplementary Table 1 for 
sample size and other details. 


misregulation detected in MeCP2 knockout mice (Fig. 3b and Extended 
Data Fig. 8). While the deletion of Dnmt3a also leads to a decrease in 
methylation at CT and CC, given that MeCP2 selectively binds to mCA 
in vitro, we conclude that reduction of mCA within gene bodies in the 
Dnmt3a conditional knockout probably disrupts length-dependent gene 
repression by MeCP2. Taken together, these findings support a model 
in which Dnmt3a catalyses the methylation of CA in the neuronal 
genome. MeCP2 then binds to these sites within the transcribed regions 
of genes to restrain transcription in a length-dependent manner. 

To characterize how the misregulation of long gene expression con- 
tributes to RTT pathology, we identified a representative set of genes 
that is consistently misregulated in multiple gene expression data sets 
when MeCP2 function is perturbed. Combined analysis of microarray 
studies across multiple brain regions identified 466 MeCP2-repressed 
genes whose expression is consistently upregulated in MeCP2 knockout 
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Figure 3 | Disruption of Dnmt3a in the brain leads to length-dependent 
upregulation of genes containing high levels of mCA. a, Summary of 
genome-wide bisulfite-sequencing analysis of mCN (where N = G, A, T or C) 
in control and Dnmt3a cKO cerebella (n = 2 per genotype). Dashed line 
represents mean background non-conversion rate of the bisulfite-seq assay 
(see Methods). b, Mean fold-change in gene expression versus gene-body mCA 
for MeCP2 KO (left) or Dnmt3a cKO (right) cerebella. Long (top 25%, > 60 kb) 
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and short (bottom 25%, < 14.9 kb) genes were binned according to gene-body 
mCA/CA levels. Lines represent mean fold-change in expression for each 

bin (200 gene bins, 40 gene step), and the ribbon is s.e.m. of genes within 
each bin. ***P < 0.005; two-tailed t-test, Bonferroni correction. Error bars 
represent s.e.m. = 5 per genotype for for MeCP2 KO, n = 3 per genotype for 
Dnmt3a cKO. 
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Figure 4 | Analysis of long gene expression and regulation in the brain. 

a, Cumulative distribution function of gene lengths for all genes in the genome 
(black), MeCP2-repressed genes (red), and genes encoding putative FMRP 
target mRNAs” (blue); P< 1X 107’ for each gene set versus all genes, 


mice and downregulated in MeCP2 overexpressing mice (Supplemen- 
tary Discussion and Supplementary Table 3). Consistent with the con- 
clusion that MeCP2-repressed genes are targets of gene-length- and 
mCA-dependent repression, these genes are exceptionally long and 
are enriched for mCA (Fig. 4a and Extended Data Fig. 8). Disruption 
of the expression of this gene set is specific to RTT, as these genes were 
not misregulated in data sets obtained from six other mouse models of 
neurological dysfunction (Extended Data Fig. 8). 

We examined the functional annotations of the 466 MeCP2-repressed 
genes to gain insight into how their disruption might contribute to 
RTT pathology. Many of these MeCP2-repressed genes encode proteins 
that modulate neuronal physiology (for example, calcium/calmodulin- 
dependent kinase Camk2d and the voltage-gated potassium channel 
Kcnh7). In addition, multiple genes involved in axon guidance and 
synapse formation were identified, including Epha7, Sdk1 and Cntn4 
(Extended Data Fig. 8). Consistent with these observations, gene onto- 
logy analysis of MeCP2-repressed genes indicates that they are enriched 
for annotated neuronal functions (for example, post-synaptic density, 
axonogenesis, voltage-gated cation channel activity; Extended Data 
Table 1). These findings suggest that RTT results from a subtle, yet 
widespread overexpression of long genes that have specific functions 
in the nervous system. 

We next considered why the misregulation of long genes as a popu- 
lation in RTT leads specifically to neuronal dysfunction. Many genes 
with neuronal function are very long”, raising the possibility that long 
genes as a population might be enriched for functions in the nervous 
system relative to other tissues. If so, the high level of mCA and MeCP2 
in neurons may have evolved to temper the expression of long genes 
specifically in the brain. Indeed, gene ontology analysis of all genes in 
the genome above 100 kb indicates that the longest genes in the gen- 
ome are enriched for neuronal annotations (Extended Data Table 1). 
Moreover, by examining tissue-specific gene expression data sets, we 
find that long genes as a population are preferentially expressed in mouse 
and human brain relative to other tissues (Fig. 4b and Extended Data 
Fig. 9). We note that while long genes typically have brain-specific func- 
tion and expression, brain-specific expression is not a prerequisite for 
regulation of long genes by MeCP2 in neurons: some long genes are 
ubiquitously expressed but selectively repressed by MeCP2 in the brain. 
(Extended Data Fig. 8 and Supplementary Discussion). 

To explore if disruption of proteins that regulate long gene expres- 
sion may broadly contribute to autism spectrum disorders (ASDs), we 
asked if a similar misregulation of gene expression occurs in a promi- 
nent ASD, fragile X syndrome (FXS). FXS is caused by inactivation of 
FMRP, a protein that represses mRNA translation in neurons”. Strik- 
ingly, we find that FMRP-target mRNAs and the genes that encode them 
are significantly longer than the genome average” (Fig. 4a, Extended 
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two-sample Kolmogorov—Smirnov test. b, Mean expression of genes binned 
according to length in mouse for neural and non-neural tissues. Line indicates 
mean expression for genes within each bin (200 gene bins, 40 gene step); 

the ribbon represents the s.e.m. of each bin. 


Data Fig. 8 and Supplementary Discussion). Moreover, we detect sig- 
nificant overlap between MeCP2-repressed genes and genes encoding 
FMRP-target mRNAs (Extended Data Fig. 8). These results suggest that 
upregulation of long gene function, either through increased transcrip- 
tion (RTT) or mRNA translation (FXS), may represent a common cause 
of pathology in neurodevelopmental disorders. 

A recent study demonstrated that pharmacological inhibition of 
topoisomerases leads to the broad downregulation of long genes in 
neurons”®, suggesting that topoisomerase inhibitors might reverse the 
upregulation of long gene expression observed in the absence of MeCP2. 
To test this, we knocked-down MeCP2 expression in cultured cortical 
neurons with RNA-mediated interference (RNAi) and treated these 
cells with the topoisomerase inhibitor topotecan. We found that MeCP2 
knockdown leads to the upregulation of long genes and that exposure 
of MeCP2-deficient neurons to topotecan results in a dose-dependent 
reversal of long gene misregulation (Extended Data Fig. 9). 

The disruption of MeCP2 function in both mouse and human neu- 
rons leads to an overall reduction in cell health that can be measured as 
a decrease in the level of ribosomal RNA and cell size’**°. Notably, we 
found that the concentration of topotecan that most effectively reverses 
overexpression of long genes (50 nM) partially reverses the decreased 
ribosomal RNA content observed in neurons lacking MeCP2 (Extended 
Data Fig. 9). This result suggests that the rebalancing of long gene 
expression improves cell health in MeCP2 knockdown neurons, lead- 
ing to increased cellular rRNA content. Taken together, these data sug- 
gest that rebalancing long gene expression in neurons lacking MeCP2 
may attenuate the cellular dysfunction observed in these cells. 

Our finding that long genes are misregulated in RTT, and that this 
misregulation can be reversed by topotecan treatment complements a 
recent study’® implicating topoisomerases in the regulation of long 
genes in the brain. Thus, our study provides additional evidence that dis- 
ruption of long gene expression may bea general mechanism underlying 
ASDs, and suggests that developing methods to rebalance long gene 
expression may be a strategy to correct neural dysfunction in these 
disorders. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Analysis of published MeCP2-regulated gene lists. To search for unique char- 
acteristics of genes found to be misregulated in Mecp2 mutant mice we interro- 
gated the list of genes found to be significantly activated or repressed by MeCP2 in 
the cerebellum of MeCP2 KO and MeCP2 OE mice®. Using published data sets for 
the mouse cerebellum from ENCODE and other sources, these genes were assessed 
for epigenetic marks at promoters and gene bodies, including histone acetylation 
and methylation as measured by ChIP-seq analysis, as well as DNA methylation 
and hydroxymethylation as measured by affinity purification methods". In addi- 
tion, we interrogated sequence attributes of genes, including dinucleotide frequen- 
cies, exon number, repeat density within genes and gene length. To determine if 
the misregulated genes were exceptional with respect to any epigenetic marks or 
sequence attributes, they were compared to several sets of control genes selected to 
be matched for gene expression levels (data not shown). Although no obvious epi- 
genetic differences were apparent from this analysis, we detected the extreme length 
of genes (measured as total basepairs from RefSeq transcription start site to tran- 
scription termination site) repressed by MeCP2 (upregulated in the MeCP2 KO and 
downregulated in the MeCP2 OE). We note that affinity-based measures of DNA 
methylation that were used in this initial unbiased search are now known to be 
insensitive to low level methylation at individual cytosines and thus do not report 
mCA levels with high fidelity. This likely explains why we did not detect a methy- 
lation signature for MeCP2-repressed genes using the affinity-based data in our 
initial analysis. Subsequent analysis of multiple published gene lists from several 
brain regions revealed the consistent, extreme length of the genes identified as 
repressed by MeCP2 in each brain region. These findings are presented in Fig. la 
as boxplots where each plot depicts the median (line), the second through to the 
third quartiles (box), 1.5X the interquartile range (whiskers), and 1.58 the inter- 
quartile range/(, number of genes) (notches). The notches on each box approximate 
a 95% confidence interval for the median value’’. Note that opposing changes in 
MeCP2 KO and MeCP2 OE published gene lists were used to define genes signi- 
ficantly activated or repressed by MeCP2 for hypothalamus’, cerebellum’, and 
amygdala’ tissues. For hippocampus’, striatum’ and liver? MeCP2 KO data alone 
had been used to identify gene lists. 

To test iflong gene misregulation is specific to Mecp2 mutants, we surveyed gene 
expression studies profiling models of neurological dysfunction, asking if long 
gene length is a common attribute in gene sets from these studies. We analysed the 
lengths of the lists of up- and downregulated genes identified in these studies or 
if ‘called’ misregulated gene lists were not available, we generated lists using the 
Genespring 12.6 software package (Agilent Technologies) or the Geo2R analysis 
tool (http://www.ncbi.nlm.nih.gov/geo/geo2r/). This analysis did not uncover any 
additional gene sets with similar long length to that of MeCP2 mutant studies 
(Extended Data Fig. 1a), suggesting that misregulation of extremely long genes is 
not acommon consequence of cell dysfunction in models of neurodegeneration or 
several other neurological diseases. 

To analyse gene expression genome-wide with respect to gene length, CEL files 
containing the raw hybridization data in from multiple MeCP2 KO and MeCP2 OE 
gene expression studies were downloaded from GEO (http://www.ncbi.nlm.nih. 
gov/geo/; study details, sample numbers and genotypes are provided in Supplemen- 
tary Table 1) and analysed for expression at the gene level using using the Gene- 
Spring software suite (Agilent Technologies) with RMA summarization of ‘core’ 
probesets. To facilitate unambiguous analysis of individual genes, expression values 
for transcript cluster IDs were filtered to include only transcript clusters that map to 
single RefSeq genes, and expression values for genes with multiple transcript clusters 
were derived by taking the average log, expression value across all transcript clusters 
corresponding to each gene. To facilitate comparison between microarray plat- 
forms, throughout this study we present analysis only for genes represented on all 
microarray platforms; this corresponds to 14,168 genes for mouse, and 17,989 genes 
for human. Although this represents a subset of genes in each genome, we have 
obtained similar results for length-dependent changes in gene expression for 
expanded gene sets covered by individual platforms (data not shown). In addition, 
similar results were obtained using the Affymetrix Power Tools pipeline with 
PLIERas an alternative summarization method. For consistency, microarray data 
for gene expression in human cells was presented using a comparable array sum- 
marization scheme as the mouse microarray data (RMA). Similar qualitative results 
showing length-dependent gene misregulation were obtained from gene expression 
values generated by Li and colleagues using a normalization scheme that included 
spike-controls’* (summarized transcript expression values were downloaded directly 
from GEO). However, with this normalization procedure, the absolute values of 
fold-change of all genes across the entire genome were downshifted in MECP2 null 
neurons relative to wild-type. For analysis of RTT patient samples, raw CEL files 
from Deng et al.'° were downloaded from GEO, and summarized using the RMA 
function in the R ‘affy’ package. 


To quantify the relationship between fold-change and gene length, we sorted 

genes by the lengths of their immature transcripts (RefSeq annotation) and employed 
a sliding window containing 200 consecutive genes in steps of 40 genes. The log, 
fold-change values for the 200 genes within each length bin were averaged and 
plotted; displayed standard errors for a bin were calculated by propagating the s.e. 
deduced from the bin’s log,-fold-change values and the mean s.e. of the individual 
genes reflecting their sample variability. Null distributions displayed on fold-change 
plots were constructed for each bin from 10,000 random samples of 200 genes 
selected without regard to transcript length. 
RNA sequencing and analysis. Total RNA was prepared from cortex of male 
wild-type and MeCP2 KO mice at 8-9 weeks of age. Formal power analysis was not 
used to predetermine sample size, however, sample size (3 per genotype) was deter- 
mined based on previous detection of length-dependent gene expression effects in 
data sets that used similar sample sizes (see Fig. 1b, cand Extended Data Fig. 1 and 
Supplementary Table 1). Animals were preselected based on genotype before col- 
lection to ensure that paired samples were taken within litters, but collection was 
randomized and the experimenter was uninformed of genotype during collection, 
sample processing, and analysis. Brain samples were dissected on ice in HBSS and 
immediately frozen in liquid nitrogen. To extract RNA, the tissue was thawed in 
trizol (Ambion), homogenized, extracted with chloroform, and further purified on 
RNeasy columns (Qiagen) using on-column DNase treatment to remove residual 
DNA as specified in the manufacturer’s instructions. High-throughput sequencing 
of total RNA was performed as a service by BGI America. Briefly, ERCC control 
RNAs (Ambion) were added to samples, and total RNA was depleted of ribosomal 
RNA using the Ribozero rRNA removal kit (Epicentre), heat-fragmented to 200- 
700 bp in length and cloned using uracil-N-glycosylase-based strand-specific cloning. 
cDNA fragments were sequenced using an Illumina HiSeq 2000, typically yielding 
20M-40M usable 49 bp single-end reads per sample (Supplementary Table 1 for 
details). Gene expression levels were assessed using an in-house analysis pipeline 
previously developed for RNA-seq quantification*. After filtering out adaptor and 
low quality reads, reads were mapped using BWA” to the mm9 genome augmented 
by an additional set of splicing targets (~3M sequences of length = 98 bp repre- 
senting all possible mm9 sequences that could cross at least one exon-exon junc- 
tion based on the RefSeq annotation). Samples were normalized based on uniquely 
mapped reads that fell outside of rRNA and noncoding genes in order to avoid 
skewing by spikes in incompletely depleted ribosomal and transfer RNA. Nor- 
malization of each sample was referred to an in-house standard of 10M 35-bp 
reads. Gene expression within exons and other features was quantified as ‘density’, 
defined as read coverage of that feature, equal to the total number of read bases per 
total number of feature bases multiplied by the overall normalization coefficient. 
Units of density are always proportional to RPKM (density = 0.35 X RPKM). 

Average read density within a gene’s exons was taken as a proxy for gene expres- 
sion (for genes with multiple annotated transcripts, exonic loci were unioned together). 
For a given set of samples, a quantile distribution (QD) was constructed from all 
samples’ sorted expression levels, and values from the QD were reassigned to each 
gene according to its rank in each sample. Within each subset of samples corres- 
ponding to wild type (WT), knockout (KO) and so on, each gene was assigned its 
mean log QD value and a standard error over its values for this subset in order to 
quantify its sample-to-sample variability within the subset. Precisely zero expres- 
sion levels were ignored in constructing the QD. The log of the fold-change (FC) 
between subsets for each gene, for example, log (KO/WT), was set to the difference 
of the means of the KO and WT log values for the gene, along with a propagated 
s.e. of the log values (variance equal to the sum of KO and WT variances). For 
consistency, the RNA-seq analysis in this study is presented for the common set 
of genes covered by microarray analyses in previous studies (see above). Similar 
results were obtained for larger sets of genes defined by all RefSeq genes. 

To confirm that our findings with RNA-seq were robust to the method of quan- 
tification used, we also performed analysis using the spliced transcripts alignment 
toa reference (STAR)™ software to align reads to the mm9 genome and Cufflinks” 
to estimate gene-level expression values as fragments per kilobase of exon model 
per million mapped fragments (FPKM). This analysis yielded results that were 
nearly identical to those generated using our in-house RNA-seq analysis pipeline. 
In addition, we derived similar results using transcripts per million (TPM)” as the 
measure mRNA levels in place of FPKM (data not shown). 

MeCP2 has previously been implicated in the repression of repeat elements 
across the mammalian genome, raising the possibility that the upregulation of long 
genes we observe in our analysis is a reflection of increased transcription from 
repeat elements or possibly cryptic promoters. To look for changes in the expres- 
sion of repeat RNAs in the MeCP2 KO brain, RNA-seq reads were mapped to the 
genome using Bowtie, keeping reads mapping to multiple sites in the genome. 
Each read was assigned a score of 1/n(n = number of sites a read mapped to). Expres- 
sion values for each repeat family was calculated by adding the scores within each 
repeat (annotated using Repeatmasker) and normalizing to sequencing depth. 
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This analysis did not reveal evidence of upregulation of specific repeat classes in 
the MeCP2 KO brain. In addition, to look for evidence of increased expression of 
repeats in connection with longer genes we assessed whether there was increased 
antisense transcription in these genes using our in-house RNA-seq analysis pipe- 
line. This analysis failed to provide evidence of increased antisense transcription. 
Another alternative explanation for our results was that the increase in expression 
of long genes we observe is due to spurious transcription, which might initiate 
from cryptic promoters within genes to generate sense coding, incomplete RNAs. 
In this case the upregulated RNAs would not reflect mature protein coding mRNA 
levels. To assess the expression of mature mRNA directly we measured mRNA 
expression by quantifying only RNA-seq reads that map across exon splice junctions. 
Consistent with there being an upregulation of mature mRNAs from long genes in 
the MeCP2 KO, this analysis yielded genome-wide length-dependent upregula- 
tion of gene expression that is highly similar to our whole-exon-based approach 
described above (data not shown). We conclude from this analysis that functional, 
protein-coding mRNAs derived from long genes are upregulated in the MeCP2 
KO, and that this increase is likely due to an alteration in canonical genic tran- 
scription mechanisms, not an increase in spurious transcripts coming from long 
gene loci. 

Gene expression analysis of MeCP2(R306C) mice. Consistent with nomenclat- 
ure from past descriptions of RTT missense mutations, the R306C nomenclature 
refers to the mouse MeCP2 isoform 2 (MeCP2_e2; NCBI Reference Sequence 
NP_034918). For gene expression analysis brain regions were dissected from male 
Mecp2*/y mice" and wild type littermates at 8-10 weeks of age and RNA was 
isolated as described above. Animals were preselected based on genotype before 
collection to ensure that paired samples were taken within litters, but collection was 
randomized and the experimenter was uninformed of genotype during collection, 
sample processing and analysis. Microarray analysis of cerebellar RNA was performed 
using the Affymetrix mouse exon 1.0 ST array platform. Analysis was performed 
in the Dana Farber microarray core facility following manufacturer’s recommen- 
dations. Analysis of hybridization data was performed as described above. Formal 
power analysis was not used to predetermine sample size, however sample size (4 
per genotype) was determined based on previous detection of length-dependent 
gene expression effects in data sets that used similar sample sizes (see Extended 
Data Fig. 1 and Supplementary Table 1). 

Validation of microarray and RNA-seq findings. For reverse transcription- 
quantitative PCR expression analysis candidate genes were selected for analysis 
in the visual cortex based on consistent upregulation in the MeCP2 KO (log,-fold- 
change greater than zero) and downregulation in the MeCP2 OE (log>-fold-change 
less than zero) across eight published microarray data sets in five brain regions 
(hypothalamus, cerebellum, amygdala, striatum, hippocampus). For Nanostring 
nCounter validation genes were selected based on the above criteria and evidence 
of upregulation in the visual cortex RNA-seq analysis. Genes with this profile were 
selected for qPCR assessment in the visual cortex. cDNA was generated from 500 ng 
of visual cortex total RNA (High-Capacity cDNA Reverse Transcription Kit, Applied 
Biosystems), and quantitative PCR was performed using transcript-specific primers 
(designed with the universal probe library design centre, Roche, Supplementary 
Table 2) and SYBR green detection on the Lightcycler 480 platform (Roche). Rela- 
tive transcript levels and fold-changes were calculated by normalizing qPCR signal 
within each sample to six genes that do not show evidence of altered expression 
across published microarray data sets (Supplementary Table 2). Similar results 
were obtained by analysing raw Cp values for test transcripts without normaliza- 
tion to control genes (data not shown). 

For non-amplification-based gene expression analysis, Nanostring nCounter 
reporter CodeSets were designed to detect candidate MeCP2-repressed genes in 
250 ng of total RNA extracted from MeCP2 KO and MeCP2(R306C) mice. Samples 
were processed at Nanostring Technologies, following the nCounter Gene Expres- 
sion protocol. Briefly, total RNA was incubated at 65 °C with reporter and capture 
probes in hybridization buffer overnight, and captured probes were purified and 
analysed on the nCounter Digital Analyzer. The number of molecules of a given 
transcript was determined by normalizing detected transcript counts to the geo- 
metric mean of ERCC control RNA sequences anda set of control genes that do not 
show evidence of altered expression across published microarray data sets. Hotel- 
ling T2 test for small sample size*” was used to calculate significance in order to 
incorporate variance across both samples and genes. Significant differences between 
wild-type and MeCP2 KO or MeCP2(R306C) samples (P < 0.01) were also detected 
by paired two-tailed t-test comparing the paired mean values for each gene (averaged 
across samples within each genotype) between genotypes. 

Electromobility shift assays. Oligonucleotide probes (Integrated DNA Technol- 
ogies) were 5’-**P-end-labelled by T4 polynucleotide kinase (New England Biolabs) 
with [y-*’P] ATP (Perkin Elmer) under conditions recommended by the enzyme 
supplier. 5’-*’P-end-labelled upper strands were purified over NucAway Spin 
Columns (Ambion) and annealed to equal molar concentration of the appropriate 
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unlabelled complement strand in 10 mM Tris, pH 8.0, 50 mM NaCl, 1 mM EDTA 
at 95 °C for 5 min, followed by slow cooling to room temperature. Similarly, unla- 
belled competitors were annealed. Proper annealing of probes and competitors was 
verified by native gel electrophoresis. 

For binding reactions using the MBD fragment of MeCP2, each reaction 
contained 180 ng of protein (amino acids (AA) 81-170, Abnova or AA 78-162, 
Diagenode), 50 fmol of 5’-*P-end-labelled probe with an excess of an unlabelled 
competitor in the presence of 1 tg of poly-dIdC (Sigma), 1X Tris-borate-EDTA 
(TBE) buffer, 1 mM DTT, 20 mM HEPES, pH 7.5, 0.5 mM EDTA, 0.2% Tween-20, 
30mM KCl, and 1x Orange DNA loading dye (Thermo Scientific). Binding was 
carried out in a 10 pl volume for 10 min at room temperature. Each reaction was 
loaded on a 10% non-denaturing polyacrylamide (37.5:1, acrylamide/bis-acrylamide) 
gel in 1X TBE buffer and electrophoresed for 30 min at 240 V on ice. For binding 
reactions using the full-length MeCP2 protein, each reaction contained 60 ng of 
protein (AA 1-486, Millipore), 100 fmol of 5’-*”P-end-labelled probe with an excess 
of unlabelled competitor in the presence of 250 ng of pdIdC (Sigma), 0.5X Tris- 
borate-EDTA (TBE) buffer, 1 mM DTT, 20 mM HEPES, pH 7.5, 0.5 mM EDTA, 
0.2% Tween-20, 30 mM KCl, and 1x Orange DNA loading dye (Thermo Scientific) 
in a 10 pl reaction volume for 10 min at room temperature. Each reaction was 
loaded on a 6% non-denaturing polyacrylamide gel (Life Technologies) in 0.5 
TBE buffer and electrophoresed for 25 min at 300 V on ice. Gels were then dried 
on Whatman filter paper on a gel drier at 80°C for 1 h. For imaging, dried gels were 
exposed to film overnight (Kodak X-Omat XB film) at —80 °C. 

Whole-genome bisulfite sequencing and analysis. For bisulfite sequencing analysis 
cerebella and cortices from four, eight-week-old mice were dissected and genomic 
DNA extracted. Starting with 25 ng of genomic DNA, 0.25 ng of unmethylated 
lambda DNA was added and libraries were generated using the Ovation Ultralow 
Methyl-Seq Library System (Nugen). Bisulfite treatment was performed using the 
EpiTect bisulfite conversion kit (Qiagen) following manufacturer’s instructions. 
Libraries were constructed using TruSeq reagents (Illumina) and sequenced on 
the Hiseq 2000 or Miseq instruments (Illumina). Reads were mapped to the mm9 
genome using BS seeker”, allowing up to four mismatches. Duplicate reads were 
removed and only uniquely mapping reads were kept (Supplementary Table 1 for 
details). For analysis of published bisulfite sequencing data sets'**“, short read files 
were downloaded from GEO, mapped, and analysed as described above, or pro- 
cessed data files showing number of reads and number of non-converted reads per 
cytosine base were used (Supplementary Table 1 for details). Methylation levels in 
all data sets were calculated as number of cytosine base calls/(number of cytosine 
+ number of thymine base calls) within mapped reads at genomic sites where the 
reference genome encodes cytosine. For hydroxymethylation analysis, the same 
approach was applied to Tet-assisted bisulfite sequencing (TAB-seq) data from 
cortical tissue**. To examine the effects of gene body methylation independently 
of promoters, only genes greater than 4.5 kb and with a minimal coverage of CGs 
and CHs were used in our analysis, and methylation levels within regions of the 
transcription start site +3 kb to transcription end site were calculated by taking 
the average methylation levels for all reads mapping within this region. Compar- 
ison to gene expression data was performed using corresponding microarray expres- 
sion values for the hippocampus and the cerebellum or RNA-seq from the cortex. 
To facilitate fold-change analysis of RNA-seq data, the genes analysed were filtered 
for minimal (non-zero) expression values. 

MeCP2 chromatin immunoprecipitation analysis. MeCP2 ChIP analysis was 
performed on cortex and cerebella dissected from 8-week-old wild-type male mice 
as previously described***’. To facilitate direct comparison of MeCP2 ChIP to 
published frontal cortex DNA methylation and hydroxymethylation data”*, we 
also performed MeCP2 ChIP analysis using the same brain region at the same 
developmental stage (frontal cortex isolated from 6-week-old mice). ChIP DNA 
was cloned into libraries and sequenced on the Illumina HiSeq 2000 or Hiseq 2500 
platform to generate 49 or 50 bp single-end reads. Reads were mapped to mouse 
genome mm9 using BWA* and custom perl scripts were employed to quantify read 
density (reads per kb) for each gene. Normalized read density values were calcu- 
lated as reads per kb in each genomic feature (for example, gene), normalized to the 
total number of reads sequenced for each sample, and divided by the reads per kb in 
that feature for the input DNA that was isolated before the ChIP and sequenced in 
parallel. As with the methylation analysis, gene bodies were defined as +3,000 bp 
to the predicted transcription termination site in the RefSeq gene model. To ensure 
sufficient coverage and accurate assessment of density in gene bodies, only genes 
greater than 4,500 bp in total length with at least one read in the input sample were 
included in the analysis. 

To explore the relationship between MeCP2 binding and mCA at high resolu- 
tion, we also quantified the MeCP2 ChIP signal from the frontal cortex in 500-bp 
bins tiled for all genes in the genome and compared it to mCA levels derived from 
high-coverage DNA methylation analysis of this brain region (Extended Data 
Fig. 4)**. In addition, we employed the MACS” algorithm to identify sites of 
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MeCP2 ChIP enrichment, or ‘summits’, across the genome and looked for evid- 
ence of mCN at these sites. Due to the broad binding of MeCP2 across the genome, 
MeCP2 ChIP yields numerous sites of modest local enrichment (~twofold), not 
isolated, highly-enriched peaks (>tenfold) that are characteristic of transcription 
factors. Thus, to define MeCP2 summits, we used a low threshold of MeCP2 ChIP 
over input enrichment (>onefold) and alow stringency P value threshold (P < 0.2), 
which yielded 31,479 summits of MeCP2 ChIP signal. Aggregate plots across all 
31,479 MeCP2 summits were generated using the annotatePeaks.pl program in the 
Hypergeometric Optimization of Motif EnRichment (HOMER)" software. Input- 
normalized MeCP2 ChIP signal was calculated as the ratio of MeCP2 ChIP/input 
read coverage. Log, enrichment of mCN under MeCP2 summits was determined 
by calculating the level of methyl-cytosine (number of non-converted cytosines 
sequenced)/(number of converted and non-converted cytosines sequenced) 
occurring at CA, CC, CT, or CG positions in the genome, normalized to the flank- 
ing region (mean of —4kb to —3 kb and 3 kb to 4 kb region relative to the MeCP2 
summit). The average value for the ChIP signal or relative mCN was then calcu- 
lated for windows (100-bp for ChIP, 10-bp for mCN) tiled across each summit 
location and averaged across all of the 31,479 summits of MeCP2 ChIP enrichment 
identified using the MACS peak-calling algorithm’ (red) and 31,479 randomly 
selected control sites (grey). 

Analysis of Dnmt3a'"; Nestin-Cre*/~ mice. Female Dnmt3a™" mice’s (kindly 
provided by M. Goodell) were bred to male Nestin-Cre*’~ mice” to generate 
Dnmt3a’*; Nestin-Cre*'~ animals. To ensure expression of the imprinted Nestin- 
Cre transgene, male Dnmt3a* Tg(Nes-cre)1KIn/J animals were bred to Dnmt3a4 
females to generate Dnmt3a™" Tg(Nes-cre)1KIn/J conditional knockout mice 
(“Dnmt3a cKO”) and Dnmt3a™" control animals (‘control’). For western blot, 
DNA methylation and gene expression analyses, cerebella were dissected from 
10-11-week-old animals. Proteins were resolved by SDS-PAGE and immuno- 
blotted using the following antibodies: Dnmt3a (abcam, ab13888), MeCP2 (custom 
antisera*’) and Gapdh (Sigma Aldrich, #G9545-25UL). Genotyping for the Dnmt3a 
locus was performed by PCR with primers flanking both loxP sites (F: 5’-GCAGC 
AGTCCCAGGTAGAAG-3’, R: 5'-ATTTTTCATCTTACTTCTGTGGCATC-3’) 
on DNA derived from tails. The presence of the Cre allele was detected using 
primers to this transgene (F: 5’-GCAAGTTGAATAACCGGAAATGGTT-3’, R: 
5'-AGGGTGTTATAAGCAATCCCCAGAA-3’). This genotyping scheme allows 
for simultaneous assessment of the presence of the floxed allele and the relative 
level of loxP recombination that has occurred in the sample. Brain-specific recom- 
bination was confirmed by PCR of tail DNA compared to cerebellar DNA (see Ex- 
tended Data Fig. 7). For gene expression analysis RNA was extracted and analysed 
as described above for MeCP2(R306C) cerebellum samples. 

Identification and analysis of MeCP2-repressed genes. To facilitate identifica- 
tion of genes repressed by MeCP2 in the context of extremely small changes in gene 
expression, we analysed the 14,168 common genes quantified across eight pub- 
lished microarray ‘training data sets’ in five brain regions (hypothalamus, cerebel- 
lum, amygdala, striatum, hippocampus), applying the lowest possible threshold for 
fold-change (fold-change >0 in the MeCP2 KO, fold-change <0 in the MeCP2 
OE) but demanding consistent misregulation in the predicted direction (at least 7 
out of 8 data sets). Genes meeting this minimal threshold for direction of change 
were then filtered for minimum average change in gene expression (>7.5%), yield- 
ing 466 MeCP2-repressed genes (Supplementary Table 3). To determine if there 
466 genes represent a significant population of reproducibly affected genes in MeCP2 
mutants above what would be expected by chance we performed 7 X 10° resam- 
pling iterations, calculating the number of genes meeting the MeCP2-repressed 
criteria when the gene identity was randomized with respect to the calculated fold- 
change. This analysis yielded an average of 31 genes per iteration (observed/ 
expected = 466/31 = 15) and did not detect an instance of 466 or more genes meet- 
ing the MeCP2-repressed criteria (maxium of 60 genes per iteration), thus yielding 
a significance of P< 1.5 X 10°°. The robustness of this gene list for predicting 
misregulation in Mecp2 mutants is demonstrated by the reproducible upregulation 
of these genes in the ‘test data sets’ in Extended Data Fig. 8. Negative control data 
sets used in this analysis to test for specificity were identified through a survey of 
available GEO data sets. To qualify for analysis they were required to have a mini- 
mum number of biological replicates similar to the MeCP2 data sets (>4) and to 
have been analysed on either of the microarray platforms used for the training data 
sets (Affymetrix MoGene 1.0 ST, or MoExon 1.0 ST). For individual gene analysis 
we calculated the significance of misregulation for individual example genes across 
the 10 Mecp2 mutant data sets displayed in Extended Data Fig. 8 as follows: after 
confirming a normal distribution of fold-change values in each data set, we calcu- 
lated a z score for the fold-change of each gene in each data set. Assuming the null 
hypothesis that each gene would be randomly sampled from a standard normal 
distribution, a t statistic was derived from the mean and standard error of the gene’s 
z scores across the data sets, and this sample’s P value was calculated from the t 
distribution for nine degrees of freedom. While the analysis presented here utilizes 


these 466 genes identified on the criteria described above, similar results for gene 
length, enriched overlap with FMRP target genes, and enrichment for neuronal 
annotations were obtained with gene lists generated using alternative criteria (for 
example, up in MeCP2 KO, down in MeCP2 OE in 8 out 8 data sets without mini- 
mum expression threshold). 

Gene ontology analysis was performed using the DAVID v6.7 bioinformatics 
resource” (http://david.abcc.ncifcrf.gov/), using the 14,168 genes covered in our 
analysis as background. Overlap of MeCP2-repressed genes with FMRP target 
genes was performed by mapping putative FMRP target lists*** to the 14,168 genes 
used for identification of MeCP2-repressed genes. Data processing, plotting, and 
statistical analysis were performed using available packages and custom scripts in R. 
Brain-specific expression of long genes. To assess expression of long genes across 
neural and non-neural tissues, RNA-seq data sets for seven mouse tissues dissected 
from eight-week-old mice“ and ten human tissues*” were mapped and quantified 
as described above. Similar results of brain-specific long gene expression were 
obtained for microarray data from the wild type samples of the five brain regions 
analysed in Mecp2 mutant studies compared to the wild type liver (data not shown). 
Neuronal cell culture and topotecan treatment. Primary cortical neurons were 
prepared from E16.5 mouse embryos and cultured as described by Kim et al.**. For 
lentiviral-mediated shRNA knockdown, virus was prepared as described in Tiscornia 
et al.“ using the MeCP2 shRNA and control shRNA plasmids previously validated 
in Zhou et al.”. Virus was concentrated and titrated using the GFP signal expressed 
from IRES GFP in the virus. After one day in vitro (DIV), cells were infected with 
lentivirus (knockdown or control) at an MOI of ~5, such that >90% of cells were 
infected. On DIV 4 cells were fed (neurobasal media with AraC, 2 UM final con- 
centration) and subsequently treated with various dilutions of topotecan in DMSO 
(0.05% DMSO final concentration). At DIV 10, cells were collected in trizol for 
RNA analysis, or protein gel loading buffer for protein. RNA samples were pro- 
cessed and analysed using the Nanostring nCounter assay as described above, with 
the exception that 6 control genes were used for normalization. Western blot analysis 
to confirm knockdown of MeCP2 was performed as described in Chen et al.”’. 
Mean values shown in Extended Data Fig. 9 (n = 3-5) are derived from separate 
cultures obtained from independent litters of mice (independent biological repli- 
cates), dissected on separate days, cultured and collected independently. 
Regulatory Approval. All animal experiments were performed in accordance with 
regulations and procedures approved by the Harvard Medical Area Standing Com- 
mittee on Animals (HMA IACUC). 
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Extended Data Figure 1 | Analysis of gene expression changes in Mecp2 
mutant mice. a, Heatmap of median gene lengths for genes identified as 
misregulated in Mecp2 mutant studies or sixteen different studies of 
neurological dysfunction and disease in mice. Mouse model and GEO accession 
number, or reference, are listed (for Strand et al. (1), 3NP treatment; (2), human 
HD brain; (3), R2/6 Htt transgenic). b, Scatter plots of fold-change in gene 
expression in the MeCP2 KO for the amygdala (left), which shows robust 
length-dependent misregulation, and the liver (right), which does not. Fold- 
change values for genes (black points) and mean fold-change for 200 genes per 
bin with a 40 gene step are shown (mean, red line; ribbon, s.e.m.). c, The 
fraction of genes showing fold-change >0 for data sets in b; genes binned 

by length (100 gene bins, 50 gene step). d-f, Analysis of published 
microarray” ’ (d, e) or RNA sequencing (RNA-seq)"* (f) data sets from MeCP2 
KO (d, f) or OE (e) mice. Mean fold-change in expression (200 gene bins, 

40 gene step), red line; ribbon, s.e.m. For d-f, mean (black line) and two 
standard deviations (grey ribbon) are shown for 10,000 resamplings in which 
gene lengths were randomized with respect to fold-change. The spike in mean 
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fold-change at ~1 kb in several plots corresponds to the olfactory receptor 
genes (Supplementary Discussion). g, Mean changes in expression of genes 
binned by length from RNA-seq analysis of MeCP2 KO cortex (n = 3 per 
genotype). h, Mean changes in expression from microarray analysis of genes 
binned by length in MeCP2(R306C) cerebellum (n = 4 per genotype) 

i, Heatmap summary of fold-changes in gene expression from RNA-seq 
analysis of Mecp2 mutant mean in g compared to Nanostring nCounter (18 
genes, top) or RT-qPCR (17 genes, bottom) analysis from cortex (n = 4 per 
genotype). Selected long genes (>100 kb) consistently upregulated in the 
MeCP2 KO or downregulated in MeCP2 OE mutant mice across brain tissues 
were tested (Supplementary Table 2). A statistically significant upregulation 
of these genes is observed in the cortex for both MeCP2 KO (nCounter, 

P = 0.00073; qPCR, P< 1X 10 '*) and MeCP2(R306C) (nCounter, 

P= 0.0482 ; qPCR, P = 1.69 x 10°; Hotelling T? test for small sample size*’). 
Note that for completeness, data from other figures have been re-presented 
here. See Methods and Supplementary Table 1 for sample sizes from published 
data sets and other details. 
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Extended Data Figure 2 | Timing and severity of gene expression changesin | MeCP2(G273X)-expressing mice compared to MeCP2(R270X)-expressing 
models of RTT. a, Mean fold-change in gene expression versus gene length in _ mice®. c, Mean fold-change in gene expression versus gene length in 


the hippocampus of MeCP2 KO mice compared to wild type at four and hippocampus of mice expressing truncated MeCP2 at nine weeks of age. 
nine weeks of age reveals increasing magnitude of length-dependent gene Consistent with the eventual onset of symptoms of these mouse strains, 
misregulation that parallels the onset of RTT-like symptoms in these animals*. _ length-dependent gene misregulation is evident in both strains. d, Changes in 
b, Mean fold-change in gene expression versus gene length in hippocampus gene expression for genes binned by length in human MECP2 null ES cells 
of mice expressing truncated forms of MeCP2 mimicking human disease- differentiated into neural progenitor cells, neurons cultured for 2 weeks, or 
causing alleles at four weeks of age. Re-expression of a longer truncated form _ neurons cultured for 4 weeks”. In all plots, lines represent mean fold-change in 
of MeCP2(G273X) in the MeCP2 KO normalizes expression of long genes expression for each bin (200 gene bins, 40 gene step), and the ribbon is s.e.m. 
more effectively than expression of a shorter truncation of MeCP2(R270X), of genes within each bin. See Methods and Supplementary Table 1 for all 
and parallels the higher degree of phenotypic rescue observed in sample sizes and other details. 
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Extended Data Figure 3 | High affinity of MeCP2 for mCG, mCA and 
hmCA in electrophoretic mobility shift assays. a, Binding of the recombinant 
methyl-binding domain (MBD) of MeCP2 (amino acids 81-170) to 3?p_end- 
labelled oligonucleotides containing a methylated cytosine in a CA (left) or 

a CG (right) context competed with unlabelled competitor substituted with 
unmethylated, methylated, or hydroxymethylated cytosine in a CG or CA 
context (indicated in bold). Representative full gels showing shifted and 
unshifted probe in the presence of 50-fold excess of unlabelled competitor 
(top); close-up of shifted bands over a range of unlabelled competitor (bottom). 
A mCA-containing oligonucleotide competes for MeCP2 binding with equal or 
higher efficacy to that of a symmetrically methylated CG oligonucleotide. 
While hmCG-containing probes compete with similar efficacy to an 
unmethylated probe, a hmCA-containing probe competes with high efficacy. 
This difference in affinity of MeCP2 for hmCA- and hmCG-containing probes 
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Full-length MeCP2 


may explain conflicting results reported for the affinity of MeCP2 for 
hydroxymethylated DNA***°**(Supplementary Discussion). b, Binding and 
competition of recombinant MeCP2 MBD (amino acids 78-162, left) or 
full-length MeCP2 (amino acids 1-486, right) incubated with 32p_end-labelled 
oligonucleotides containing a methylated cytosine in a CA context and 
competed with oligonucleotides containing unmethylated, methylated, or 
hydroxymethylated cytosine in a CG, CA, CT, or CC context. Representative 
full gels showing 100-fold excess of unlabelled competitor (top); close-up of 
shifted bands over a range of unlabelled competitor (bottom). The results 
obtained from competitors containing mCG, mCA, hmCG and hmCA are 
similar to those shown in a. In addition, both (h)mCT- and (h)mCC- 
containing oligonucleotides compete for MeCP2 binding with similar efficacy 
to that of an unmethylated probe. All results shown were observed in at least 
two independent experiments. 
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Extended Data Figure 4 | ChIP-seq analysis of MeCP2 binding in vivo. 

a, Boxplots of input-normalized read density within gene bodies (TSS +3 kb to 
TTS) for MeCP2 ChIP from the mouse frontal cortex plotted for genes 
according to quartile of mCA/CA, mCG/CG, hmCA/CA and hmCG/CG in the 
frontal cortex” for all genes and genes >100 kb. b, Similar analysis of MeCP2 
ChIP from the mouse cortex (left) or cerebellum (right) plotted for genes 
according to quartile of mCA/CA or mCG/CG for all genes and genes > 100 kb. 
MeCP2 ChIP-signal is correlated with mCA/CA levels from the frontal cortex, 
cortex, and cerebellum for all genes and this correlation is more prominent 
among genes >100 kb. mCG does not show as prominent a correlation with 
MeCP2 ChIP signal, and hmCG trends towards anti-correlation with MeCP2 
ChIP. These results suggest that MeCP2 has a lower affinity for hmCG than 


Position relative to MeCP2 
ChIP summit (bp) 


mCG, suggesting that, in vivo, hmCG is associated with reduced MeCP2 
occupancy (Supplementary Discussion). ¢, High resolution analysis of high- 
coverage bisulfite sequencing data from the frontal cortex showing a correlation 
between MeCP2 ChIP signal and mCA. Input-normalized ChIP signal plotted 
for mCA levels for 500-bp bins tiled across all genes. d, Aggregate plots of 
MeCP2 input-normalized ChIP signal (top) and relative methylation (log, 
enrichment in mC as compared to the flanking regions) for mCA, mCC, mCT, 
and mCG (bottom) are plotted around the 31,479 summits of MeCP2 ChIP 
enrichment identified using the MACS peak-calling algorithm” (red) or 31,479 
randomly selected control sites (grey, see Methods). See Methods and 
Supplementary Table 1 for sample sizes and other details. 
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Extended Data Figure 5 | Genomic analysis of mCG, hmCG, and hmCA in 
length-dependent gene regulation by MeCP2. a-c, Mean methylation of 
CG dinucleotides (mCG/CG) within gene bodies (transcription start site 

+3 kb, up to transcription termination site) in the cortex (a), hippocampus (b) 
and cerebellum (c) for genes binned according to length. d-f, Mean fold-change 
in gene expression in MeCP2 KO compared to wild type in the cortex (d), 
hippocampus (e), and cerebellum (f) for genes binned according to mCG levels 
(mCG/CG) within gene bodies. g, Mean hmCG levels (hmCG/CG) within gene 
bodies in the frontal cortex” for genes binned according to length. h, Mean 
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fold-change in gene expression in MeCP2 KO compared to wild type for genes 
binned according to hmCG levels (hmCG/CG) within gene bodies in the 
frontal cortex” i, Mean hmCA levels (hpmCA/CA) within gene bodies in the 
frontal cortex” for genes binned according to length. j, Mean fold-change in 
gene expression in MeCP2 KO compared to wild type genes binned according 
to hmCA levels (hmCA/CA) within gene bodies in the frontal cortex”. In 

all panels, mean values for each bin are indicated as a line (200 gene bins, 40 
gene step); ribbon depicts s.e.m. for genes within each bin. See Methods and 
Supplementary Table 1 for sample sizes and other details. 
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Extended Data Figure 6 | Genomic analysis supports a role for mCA in 
length-dependent gene regulation by MeCP2. a—c, Mean methylation at CA 
dinucleotides (mCA/CA) within gene bodies (TSS +3 kb to TTS) in cortex (a), 
hippocampus (b), and cerebellum (c) for genes binned by length. d-f, Mean 
changes in gene expression in cortex (d), hippocampus (e), and cerebellum (f) 
of MeCP2 KO for high mCA genes (top 25% mean gene body mCA/CA) and 
low mCA genes (bottom 66% mean gene body mCA/CA) binned by length. 
g-i, Mean changes in gene expression in cortex (g), hippocampus (h), and 
cerebellum (i) of MeCP2 KO for genes binned according to average gene 
body mCA/CA levels. j-1, Mean changes in gene expression in cortex (j), 
hippocampus (k), and cerebellum (1) of MeCP2 KO mice for long genes (top 
25%) and short genes (bottom 25%) in each brain region binned by gene body 
mCA/CA level. A correlation between fold-change in the MeCP2 KO and 


Mean mCA/CA 


Mean mCA/CA 


mCA/CA for all genes is less prominent, or not observed, in the hippocampus 
and cerebellum for all genes together (h, i), but it is clear for the longest genes in 
the genome analysed alone (k, 1). Note that average levels of mCA appear 
lower in hippocampus and cerebellum compared to cortex (compare y axis 
in a, b and c), and may explain why a correlation across all genes in not detected 
in these brain regions. In long genes analysed alone the cumulative effect of 
higher mCA levels and integration across the gene may be larger, resulting in a 
detectable effect. In all panels, the line indicates the mean for 200 gene bins, 
with a 40 gene step; ribbon depicts s.e.m. for genes within each bin. Note that, 
for completeness, data from analysis of the cortex presented in Fig. 2 are 
re-presented here. See Methods and Supplementary Table 1 for sample sizes 
and other details. 
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Extended Data Figure 7 | Conditional knockout of Dnmt3a in vivo. 

a, Diagram of the Damt3a locus and Cre-dependent conditional knockout 
strategy for Dnmt3a°°. LoxP sites (green triangles) flank exon 17, which is 
removed following Cre-mediated recombination. Primers (purple arrows) were 
designed to flank exons 17 and 18. The wild-type (WT), floxed (FLX), and 
knockout (KO) allele are depicted. b, Representative PCR genotyping for tail 
DNA samples indicates presence or absence of the floxed (flx, ~800 bp), 
wild-type (WT, ~750 bp), and knockout (KO, ~500 bp) alleles. Separate 
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genotyping reaction for the Nestin-cre transgene (~250 bp) is shown. 

c, Efficient excision of the floxed exon is detected in cerebellar DNA from 
conditional knockout (Dnmt3a""*, Nestin-Cre‘/~, Dnmt3a cKO) mice but 
not from and control animals (Dnmt3a™", Control). d, Western blot analysis 
of Dnmt3a, MeCP2, and Gapdh (loading control) protein from the cerebellum 
of control and Dnmt3a cKO adult mice. All results shown were observed in 
at least two independent experiments. 
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Extended Data Figure 8 | Analysis of MeCP2-repressed genes and FMRP 
target genes. a, Mean fold-change in mRNA expression for examples of 
MeCP2-repressed genes across three different Mecp2 mutant genotypes 

(KO, OE, and R306C) and six brain regions. P values for each gene are derived 
from the mean z scores for fold-change across all data sets (see Methods). 

b, Gene expression and CA methylation data from the cerebellum for selected 
MeCP2-repressed genes from a (right), as well as examples of extremely 

long genes (>100 kb) that are not enriched for mCA and are not misregulated 
(left). Fold-changes in mRNA expression in Mecp2 mutants and the Dnmt3a 
cKO are shown (left axis), as well as mean mCA levels (grey; right axis). Red 
line indicates genomic median for gene body mCA/CA c¢, Boxplots of mCA 
levels in MeCP2-repressed genes compared to all genes. d, Mean fold-change 
for MeCP2-repressed genes in eight ‘training data sets’ used to define these 
genes (see Methods), and nine ‘test data sets’: three Mecp2 mutant data sets 
not used to define MeCP2-repressed genes (CTX MeCP2 KO and CB 
MeCP2(R306C), generated in this study; HC MeCP2 KO 4 week, analysed from 
Baker et al.*), and six data sets from brains of mouse models of neurological 
dysfunction generated using the same microarray platforms as the MeCP2 data 
sets (GEO accession numbers in order: GSE22115, GSE27088, GSE43051, 
GSE47706, GSE44855, GSE52584). Error bars are s.e.m. of MeCP2-repressed 
gene expression across samples (n = 4-8 microarrays per genotype per data 
set); **P < 0.01, one-tailed t-test, Benjamini-Hochberg correction. Note that 
significance testing was not performed on training data sets. Brain regions 
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indicated as in Fig. 1, (WB, whole brain). e, Cumulative distribution function 
(CDE) of gene lengths plotted exclusively for genes that are among the top 60% 
of expression levels in the brain (Supplementary Discussion). The extreme 
length of MeCP2-repressed genes and genes encoding FMRP target mRNAs” 
when controlling for expression level indicates that the long length of these 
genesets is not a secondary effect of the preferential expression of long genes in 
the brain (P< 1 X 107! for each geneset versus all expressed genes; two- 
sample Kolmogorov-Smirnov test). f, The CDF of gene lengths for all genes 
compared to an independent set of FMRP targets identified by Brown and 
colleagues’ (P< 1 X 107 '°, Kolmogorov-Smirnov-test). g, CDF of gene 
lengths for genes expressed at similar levels in the brain and other somatic 
tissues (Supplementary Discussion). The extreme length of each geneset 
(P<1X 10 '°, Kolmogorov-Smirnov test) when filtering for genes that are 
expressed in all tissues indicates that regulation of long genes by MeCP2 

and FMRP is not dependent on brain-specific expression. h, CDF of mature 
mRNA lengths for MeCP2-repressed genes, and FMRP target genes 

(P<1 X10 '! for each geneset versus all genes, Kolmogorov-Smirnov test). 
i, Overlap of MeCP2-repressed genes and putative FMRP target mRNAs” 
(P<5X 10°, hypergeometric test). Expected overlap was calculated by 
dividing the expected overlap genome-wide (hypergeometric distribution) 
according to the distribution of all gene lengths in the genome. See Methods 
and Supplementary Table 1 for sample sizes and other details. 
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Extended Data Figure 9 | Consequences of long gene misregulation in 
neurons. a, Mean expression of genes binned according to length in human 
neural and non-neural tissues. Mean expression for genes within each bin 
(200 gene bins, 40 gene step) is indicated by the line; ribbon represents the s.e.m. 
of genes within each bin. b, Western blot analysis of MeCP2 from primary 
cortical neurons after control or MeCP2 shRNA knockdown (KD) and 
treatment with DMSO vehicle (—) or topotecan (+). c, Heatmap summary 
of nCounter analysis for the expression of selected MeCP2-repressed (MR) 
genes from primary neurons treated with control or MeCP2 shRNA and 
topotecan (n = 3-4). Normalized log, fold-change relative to the DMSO- 
treated, control KD is shown. MeCP2 KD conditions are significantly different 


® Control KD 
@ MeCP2 KD 


nose 
= 


Topotecan (nM) 


from control, (P= 1 X 10-4, repeated measures ANOVA across 8 genes). 
Newman-Keuls corrected, post-hoc comparisons: P < 0.05 control KD, 0nM 
drug versus MeCP2 KD, 0 nM drug; P > 0.05, control KD, 0 nM drug versus 
MeCP2 KD, 50 nM drug; P< 0.05 MeCP2 KD, 0 nM drug versus MeCP2 KD, 
50nM drug. d, Bioanalyzer profiles of 18S and 28S ribosomal RNA (top) and 
total RNA quantification (bottom) for treated neurons (n = 3-5). Total RNA 
values normalized to DMSO-treated control KD, red dashed line. Two-way 
repeated measures ANOVA indicates a significant effect of KD (P < 0.01) and 
drug treatment (P < 0.05). Rescue assessed by one-tailed t-test, Bonferroni 
multiple testing correction, *P < 0.05. 
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Extended Data Table 1 | Gene ontology analysis of MeCP2-repressed genes and genes >100 kb 


MeCP2 Repressed Genes Genes Longer that 100KB 
(466 genes) (1431 genes) 
Gene EASE Fold = Benjamini cle] Gene EASE Fold = Benjamini cle] 
GO Term Count pval_ Enriched pval Accession GO Term Count pval_ Enriched pval Accession 
Biological Process Biological Process 
axon guidance 17 3.7E-08 5.6 6.3E-05 GO:0007411 phosphate metabolic process 150 1.2E-18 2 3.4E-15 GO:0006796 
axonogenesis 21 6.5E-08 4.3 5.5E-05 GO:0007409 phosphorus metabolic process 150 =: 1.2E-18 2 3.4E-15 GO:0006793 
cell morphogenesis involved in differentiation 23 2.4E-07 3.7 1.4E-04 GO0:0000904 protein modification process 191 8.3E-18 1.8 1.2E-14 GO:0006464 
neuron projection morphogenesis 21 2.6E-07 4 1.1£-04 GO:0048812 protein amino acid phosphorylation 120 9 4,.1E-17 2.2 4.0E-14 GO:0006468 
cell morphogenesis involved in neuron differentiation 21 3.6E-07 3.9 1.2E-04 GO:0048667 biopolymer modification 191 1.4E-15 1.7 1.0E-12 GO:0043412 
neuron projection development 23 3.9E-07 3.6 1.1£-04 GO:0031175 phosphorylation 124 ~=—-2.7E-15 2 1.6E-12 GO:0016310 
neuron development 26 9.3E-07 3:1 2.3E-04 cellular component organization 247 1.1E-14 1.6 5.5E-12 GO:0016043 
cell projection morphogenesis 21 1.3E-06 3.6 2.8E-04 biological adhesion 101 =. 2.4E-14 mie 1,0E-11 GO:0022610 
cell morphogenesis 26 2.1£-06 3 4.0E-04 cell adhesion 101 =-.2.4E-14 2.2 1.0E-11 | GO:0007155 
cell part morphogenesis rail 3.3E-06 3.4 5.5E-04 post-translational protein modification 156 = 1.4E-13 1.8 5.0E-11 G0:0043687 
phosphate metabolic process 49 3.3E-06 2 5.1E-04 GO:0006796 cellular process 849 1.6E-13 12 5.2E-11 | G0:0009987 
phosphorus metabolic process 49 3.3E-06 2 5.1E-04 G0:0006793 nervous system development 137 = 4,9E-13 1.8 1.4E-10 GO:0007399 
cellular component morphogenesis 27 4,2E-06 2.8 5.9E-04 G0:0032989 cell projection organization 65 9,0E-11 2:3 2.4E-08 GO:0030030 
cell projection organization 26 5.2E-06 2.8 6.8E-04 G0:0030030 cell morphogenesis 62 2.3E-10 23 5.6E-08 GO:0000902 
enzyme linked receptor protein signaling pathway 24 8.5E-06 2.9 1.0E-03 _GO:0007167 neuron development 59 8.2E-10 2.3 1.8E-07 _G0:0048666 
Cellular Component Cellular Component 

plasma membrane 110 =2.00E-05 1.4 5.3E-03 GO:0005886 synapse 74 7.9E-17 2.8 4.4E-14 G0:0045202 
cell junction 29 = 4.30E-05 7s) 5.7E-03 G0:0030054 cell junction 86 2.5E-13 23 5.0E-11 G0:0030054 
cytoskeleton 50 6.00E-05 1.8 5.4E-03 GO:0005856 neuron projection 58 3.4E-13 2.8 4.5E-11 G0O:0043005 
postsynaptic density 8 2.40E-04 6.2 1.6E-02 GO:0014069 cell projection 96 1.4E-12 2p 1.4E-10 GO0:0042995 
synapse 21 3.80E-04 2.4 2.0E-02 GO:0045202 cytoskeleton 149 2.9E-12 17 2.3E-10 GO:0005856 
plasma membrane part 64 8.20E-04 15 3.6E-02 G0:0044459 plasma membrane 325  5.3E-12 1.4 3.5E-10 G0O:0005886 
cell fraction 29 2.10E-03 1.8 7.9E-02 GO:0000267 plasma membrane part 205 5.5E-12 1.6 3.1E-10 G0:0044459 
basement membrane 8 2.60E-03 4.2 8.3E-02 G0O:0005604 extracellular matrix part 30 2.5E-11 4 1.2E-09 GO0:0044420 
neuron projection 16 3.10E-03 2.4 8.7E-02 G0:0043005 basement membrane 24 1.8E-09 4.1 8.0E-08 GO:0005604 
synapse part 14 = 3.20E-03 2.6 8.3E-02 G0:0044456 synapse part 43 1.1E-08 2.6 4.2E-07. GO:0044456 
insoluble fraction 26 3.50E-03 1.9 8.3E-02 GO:0005626 proteinaceous extracellular matrix 56 1.4E-08 2.2 4.9E-07 G0:0005578 
membrane fraction 25 4.50E-03 18 9.7E-02  G 105624 axon 31 1.8E-08 ej 6.1E-07 G0:0030424 
postsynaptic membrane 10 4.90E-03 3.1 9.7E-02 GO:0045211 extracellular matrix 57 2.3E-08 2.2 7.1E-07 GO:0031012 
dendrite 29° -4.8E-08 3.1 1.4E-06 GO:0030425 

postsynaptic membrane 29 2.2E-07 2.9 5.8E-06 GO:0045211 

Molecular Function Molecular Function 

cation binding 148 =5.00E-07 1.4 2.6E-04 GO0:0043169 calcium ion binding 138 1.4E-15 2 1.2E-12 GO:0005509 
metal ion binding 147 5.50E-07 14 1.4E-04 GO:0046872 protein kinase activity 111 1.6E-15 2.2 6.6E-13 GO:0004672 
ion binding 149 7.50E-07 1.4 1.3E-04 GO:0043167 adenyl ribonucleotide binding 209 9.3E-15 a7. 2.6E-12 G0O:0032559 
calcium ion binding 47 ~—-8.50E-06 2 1,1£-03 GO:0005509 cytoskeletal protein binding 85 1,5E-14 2.4 3.1E-12 GO:0008092 
actin binding 21 2.00E-04 2.6 2.1E-02 G0:0003779 GTPase regulator activity 72. 1.2E-13 25 2.0E-11 G0O:0030695 
cytoskeletal protein binding 26 3.80E-04 22) 3.3E-02 GO:0008092 nucleoside binding 215 = 1.5E-13 1.6 2.1E-11 GO:0001882 
protein kinase activity 33) 5.30E-04 1:9 3.9E-02 G0O:0004672 adenyl nucleotide binding 212 2.6E-13 1.6 3.1E-11 G0:0030554 
cation channel activity 18  8.40E-04 25 5.3E-02 G0:0005261 purine nucleoside binding 213 -2.8E-13 1.6 2.9E-11 G0:0001883 
voltage-gated cation channel activity 12 1.10E-03 3.2 6.3E-02 G0O:0022843 ATP binding 202 3.0E-13 1.6 2.8E-11 G0O:0005524 
alkali metal ion binding 16  1.40E-03 2.6 6.8E-02 G0:0031420 nucleoside-triphosphatase regulator activity a 3.9E-13 2.5 3.3E-11 G0:0060589 
metal ion transmembrane transporter activity 19 1.90E-03 2.3 8.6E-02 GO:0046873 ion binding 412  9.8E-12 13 7.5E-10 GO:0043167 
voltage-gated ion channel activity 14 = 1.90E-03 2h 7.9E-02 G0:0005244 metal ion binding 403 2.0E-11 13 1.4E-09 GO0:0046872 
voltage-gated channel activity 14 1.90E-03 27. 7.9E-02 GO:0022832 purine ribonucleotide binding 229 -2.5E-11 15 1.6E-09 GO:0032555 
potassium ion binding 11  —.2.20E-03 3.2 8.6E-02 G0O:0030955 ribonucleotide binding 229 -2.5E-11 a5 1.6E-09 G0:0032553 
cation binding 404 3.8E-11 1.3 2.3E-09 _GO:0043169 


Functional annotation clustering analysis of genes identified as MeCP2-repressed and the longest genes in the genome (>100 kb) was performed using the DAVID bioinformatics resource (DAVID v6.7)“*. The top 
fifteen enriched gene ontology terms with P< 0.01 (Benjamini multiple testing correction) are listed for “Biological Process”, “Cellular Component”, and ‘Molecular Function”, respectively. 
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Dissemination, divergence and establishment of 
H7N9 influenza viruses in China 


Tommy Tsan-Yuk Lam!*3*, Boping Zhou'*, Jia Wang”**, Yujuan Chai?**, Yongyi Shen?**, Xinchun Chen'*, Chi Ma’, 
Wenshan Hong’, Yin Chen‘, Yanjun Zhang", Lian Duan'*?, Peiwen Chen!”, Junfei J iang’’, Yu Zhang”, Lifeng Li??, 
Leo Lit Man Poon!’, Richard J. Webby”, David K. Smith”, Gabriel M. Leung?, Joseph S. M. Peiris+?, Edward C. Holmes*, 


Yi Guan’? & Huachen Zhu? 


Since 2013 the occurrence of human infections by a novel avian H7N9 
influenza virus in China has demonstrated the continuing threat 
posed by zoonotic pathogens'”. Although the first outbreak wave that 
was centred on eastern China was seemingly averted, human infec- 
tions recurred in October 2013 (refs 3-7). It is unclear how the H7N9 
virus re-emerged and how it will develop further; potentially it may 
become a long-term threat to public health. Here we show that H7N9 
viruses have spread from eastern to southern China and become 
persistent in chickens, which has led to the establishment of mul- 
tiple regionally distinct lineages with different reassortant genotypes. 
Repeated introductions of viruses from Zhejiang to other provinces 
and the presence of H7N9 viruses at live poultry markets have fuelled 
the recurrence of human infections. This rapid expansion of the 
geographical distribution and genetic diversity of the H7N9 viruses 
poses a direct challenge to current disease control systems. Our re- 
sults also suggest that H7N9 viruses have become enzootic in China 
and may spread beyond the region, following the pattern previously 
observed with H5N1 and H9N2 influenza viruses*”. 


The second wave of the H7N9 outbreak that begun in late 2013 has 
resulted in 318 human cases and over a hundred deaths as of 12 Sep- 
tember 2014 (ref. 7), more than twice that of the first wave. Guangdong, 
which had no reported human infection in the first wave, and Zhejiang 
have reported the highest numbers of human cases in the second wave’. 
We used influenza surveillance at live poultry markets (LPMs) in Zhe- 
jiang, Guangdong, Jiangxi, Jiangsu and Shandong provinces, at spe- 
cific times or routinely, from October 2013 to July 2014 (Extended Data 
Tables 1 and 2), and at hospitals in Shenzhen (Guangdong) from De- 
cember 2013 to April 2014, to trace the evolution and spread of the 
second wave of the H7N9 outbreak. 

Active surveillance in fifteen cities across these five provinces iden- 
tified 493 H7N9 viruses from oropharyngeal swabs of market chickens, 
with an average isolation rate of 3.0% (Extended Data Table 1 and Fig. 1). 
Only five H7N9 viruses were isolated from 2,465 cloacal swabs sam- 
pled in chickens in Jiangxi and Guangdong, giving an isolation rate of 
0.2%. No H7N9 virus was isolated from domestic ducks during this 
survey (Extended Data Table 2). These findings highlight that market 
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chickens shedding H7N9 viruses via the oropharyngeal route are cen- 
tral to the H7N9 outbreak. Human cases*” were reported in all of the 
seven cities where H7N9 viruses were detected in chickens. The H7N9 
positive rates in Dongguan (Guangdong) increased from 3.2% (Decem- 
ber 2013) to 8.6% (February 2014) (Extended Data Table 1 and Fig. 1). 
Our routine surveillance in Nanchang (Jiangxi) and Shantou (Guang- 
dong) revealed that both cities were negative for H7N9 viruses until 
February 2014. Since then, H7N9 viruses have been detected every month 
up to July 2014, with isolation rates ranging from 2.0% to 15.4% in 
Nanchang, and from 1.0% to 7.5% in Shantou (Extended Data Table 1). 
No H7N9 viruses were detected at live poultry markets in the six cities 
in Shandong and Jiangsu that we sampled during October 2013 (Ex- 
tended Data Table 1). 

To examine the re-emergence of H7N9 in more detail we sequenced 
the complete genomes of 438 H7N9 and 263 related influenza viruses 
(including 194 H9N2) isolated from poultry from October 2013 to July 
2014, and 19 H7N9 human isolates obtained from hospitals in Shenzhen. 
Phylogenetic and phylogeographic analyses of the H7 haemagglutinin 
(HA) genes confirmed that all of the second wave avian and human 
H7N9 viruses were descended from the viruses of the first wave (Fig. 2a). 
Wave | HA genes from the affected provinces were very similar and 
generally branched from a central node (shown as a red empty circle in 
Fig. 2a) of the wave 1 (W1) clade that fell at the base of the tree, indi- 
cative of a broad dissemination of the virus during the initial outbreak. 
In contrast, viruses of the second wave (W2) clustered into three major 
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clades, designated as W2-A, W2-B and W2-C (Fig. 2a), all of which 
emerged from the W1 clade. 

The W2-A clade contains viruses detected in multiple provinces from 
October 2013 to July 2014. Most virus isolates from Zhejiang were lo- 
cated near the root of this clade, while clusters of viruses from Jiangxi, 
Guangdong and Fujian diverged from some other Zhejiang viruses in 
this clade (for example, ZJ/30/2014; Fig. 2a). This suggests that this 
clade originated in Zhejiang, or the Yangtze River delta region, and then 
spread to other provinces (Supplementary Discussion 1.4; http://dx.doi. 
org/10.5061/dryad.5q7kf). The differing sub-clades of Jiangxi viruses 
in clade W2-A contain isolates sampled from February to April 2014, 
suggesting multiple introductions of the viruses, probably by poultry 
movement. 

Clade W2-B is likely to have originated from viruses closely related 
to those isolated in Guangzhou in May 2013 (Fig. 2a, Supplementary 
Discussion 1.4; http://dx.doi.org/10.5061/dryad.5q7kf) that apparently 
persisted in Guangdong over summer®”®. These viruses have prolife- 
rated since October 2013 and caused the largest number of human in- 
fections reported in the second outbreak wave**’. Viruses within this 
clade have only been detected in Guangdong, mainly in the Pearl River 
delta region, suggesting that they have become established and enzo- 
otic in the chicken population in this locality. 

Clade W2-C viruses were mainly isolated from Jiangxi from April 
to June 2014. Two recent human isolates from Taiwan’, which origi- 
nated from Jiangsu, fall in this clade close to the Jiangxi isolates (Fig. 2a), 
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suggesting that W2-C viruses were also simultaneously prevalent in east- 
ern China. The outgroup viruses of this clade, detected in Zhejiang and 
Shanghai**"?”, indicate that it may have originated in eastern China. A 
lack of surveillance data from the beginning of wave 2 prevents iden- 
tification of the early spread of this clade (Supplementary Discussion 
1.4; http://dx.doi.org/10.5061/dryad.5q7kf). Similar situations may have 
occurred in neighbouring provinces, such as Anhui and Hunan, where 
human cases have been reported*’, although no or few chicken viruses 
have been reported from these localities. 

Phylogenetic analysis of the N9 neuraminidase (NA) genes revealed 
a similar topology to that of the H7 HA tree, with the wave 2 viruses 
separated into three corresponding clades (available from http://dx. 
doi.org/10.5061/dryad.5q7kf). However, the N9 genes from the early 
Guangdong viruses did not initially form a distinct subclade. After Oc- 
tober 2013, the W2-B clade in the N9 genes became established. Some 
viruses isolated in Shantou (Guangdong) that possess a W2-A HA have 
reassorted to acquire the W2-B NA gene, suggesting the co-circulation 
of different clades of H7N9 viruses in Guangdong. 

The internal genes of all H7N9 viruses studied here belonged to the 
ZJ-HJ/07 lineage’* of H9N2 viruses, which is broadly classified into 
three clades (clades 1-3; Fig. 2b, Extended Data Fig. 1). In each internal 
gene the majority of wave 1 H7N9 viruses formed a sub-clade (denoted 
as clade 1.1) in clade 1. Wave 2 viruses had internal genes from the 
regionally distributed clade 2 and from clade 3, which was local to 
Guangdong. 

In the second wave, none of the avian and human H7N9 viruses in- 
herited all six internal gene segments from clade 1.1 (Fig. 2, Extended 
Data Figs 1 and 2). The early W2-A viruses mostly acquired PB2 and M 
segments from clade 2 of the ZJ-HJ/07 lineage. The later spread of these 
viruses to Jiangxi, Guangdong and Fujian led to the replacement of PB1 
and PA with clade 2 segments in most viruses, suggesting sequential re- 
assortment events with local H9N2 viruses (Fig. 2, Extended Data Fig. 1). 
The clade 1 origin NP and NS segments were generally retained in the 
W2-A viruses. The majority of W2-B viruses had four or five internal 
gene segments derived from clade 3. Almost all W2-B viruses retained 
the clade 1.1 M gene segment and approximately half had a clade 1.1 
origin PA segment. W2-C viruses mostly reassorted to acquire clade 2 
PB2, PA and M segments and a clade 3 NS segment, with their PB1 and 
NP segments remaining mainly from clade 1. Notably, two H7N6 viruses, 
which were reassortants of H7N9 W2-A viruses and H5N6 viruses co- 
circulating in the poultry of Jiangxi, were identified (Fig. 2). Thus, the 
evolution of the wave 2 H7N9 viruses from those of wave 1 has resulted 
in a major increase in genetic diversity (Extended Data Table 3, Sup- 
plementary Discussion 1.1). 

Although amino acid changes occurred in the HA (Extended Data 
Table 4), limited antigenic differences were observed (Supplementary 
Data, Supplementary Discussion 1.2). Mutations associated with drug 
resistance in the NA protein’*”* only occurred in a small number of human 
isolates (Extended Data Table 5), probably reflecting a response to drug 
treatment. Consistent differences between the human and avian isolates 
were restricted to the PB2 residues 627 and 701 (Extended Data Table 5, 
Supplementary Discussion 1.3), which are frequently seen when an avian 
virus enters a mammalian host'*”. 

Our study has shown that the H7N9 influenza virus has diverged 
into distinct clades, becoming established in chickens and disseminat- 
ing to wider geographic regions. This probably occurred by poultry move- 
ment along trade routes, with the localization of the W2-B clade in 
turn reflecting limited exports from high poultry consuming areas. 
Human infections have mostly been reported from southern and east- 
ern China’ *”’®'*, With the recent reports of H7N9 infections in Xin- 
jiang’ in the far northwest of China, it is probable that the H7N9 virus 
is now present across most of China. As this virus does not cause obvious 
symptoms in chickens” and only limited surveillance has been con- 
ducted, the prevalence of this virus is likely to be higher than we docu- 
ment here, and it has had the opportunity to become enzootic over a 
wide region. Given the current pattern of dissemination, it will only be 
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a matter of time before poultry movement spreads this virus beyond 
China by cross-border trade, as happened previously with HSN1 and 
H9N2 influenza viruses*”’. 

The enzootic H5N1 and H9N2 viruses*”, along with the H7N9 virus, 
are now reassorting with other viruses in the influenza ecosystem in China, 
giving rise to novel variants such as H10N8 (ref. 22), H10N6 (ref. 23), 
H5N8 (ref. 24), HSN6 (ref. 25) and H7N6 (this study). This expansion 
of the genetic diversity of influenza viruses in China means that unless 
effective control measures are in place, such as permanent closure of 
live poultry markets, central slaughtering and preventing inter-regional 
poultry transportation during disease outbreaks, and backed by system- 
atic surveillance, it is reasonable to expect the H7N9 and other viruses 
to persist and cause a substantial number of severe human infections. 
H7 is the only subtype, other than the pandemic subtypes, that has been 
established in mammals (Equine-1, H7N7)”°. Therefore, H7N9 viruses 
should be considered as a major candidate to emerge as a pandemic 
strain in humans. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Influenza surveillance at the live poultry markets and sample processing. Sur- 
veillance of influenza viruses was conducted in live poultry markets in 13 cities in 
Zhejiang (October 2013), Jiangsu (October 2013), Shandong (October 2013) and 
Guangdong (December 2013 and February 2014) provinces during the second wave 
of the H7N9 outbreak. Samples were also collected on a weekly basis from routine 
surveillance in Nanchang (Jiangxi) and Shantou (Guangdong) cities during Octo- 
ber 2013 to July 2014 and included in this study (Extended Data Tables 1 and 2). In 
the live poultry markets, oropharyngeal and cloacal swabs were taken from appar- 
ently healthy chickens and ducks. Faecal droppings were also collected from the 
duck holding areas at the markets of Nanchang. 

All samples were collected in individual vials, placed in transport medium with 
antibiotics and packed on ice before sending to the laboratory for further proces- 
sing. Samples were inoculated in 9- to 11-day-old embryonated chicken eggs for 
48 h at 37 °C to assess the presence of influenza viruses. Haemagglutinin-positive 
isolates were further subtyped by haemagglutinin inhibition using a panel of ref- 
erence anti-sera as described previously”. 

Influenza surveillance at the Shenzhen hospitals. Influenza surveillance was con- 
ducted, with informed consent, among patients admitted to hospitals in Shenzhen 
with acute and rapidly progressing pneumonia resistant to antibiotic therapy since 
December 2013. This study was approved by the Ethics Committee of the Shenzhen 
Third People’s Hospital and the Health Bureau of Shenzhen. Nasal, oropharyngeal 
swabs and/or tracheal aspirate samples from each patient were collected into trans- 
port medium and sent to the diagnostic labs within two hours. RNA was extracted 
using the QIAamp Viral RNA Minikit (Qiagen) and tested for influenza virus pres- 
ence using the diagnostic real-time RT-PCR protocol for H7N9 and the seasonal 
influenza viruses, following the World Health Organization guidelines”’. Each clin- 
ical sample was separately inoculated into 9- to 10-day-old embryonated chicken 
eggs and Madin-Darby canine kidney (MDCK) cells for virus isolation. Whole 
genomic sequences were obtained from either the virus isolates or directly from 
the original clinical samples via next generation sequencing. 

Genome sequencing. Based on the haemagglutinin inhibition results, poultry H7- 
positive isolates were selected for sequencing. Three to five H9-positive isolates from 
each sampling occasion were also selected. For human samples positive for H7N9 
but for which we failed to obtain virus isolates, whole genomic sequencing was at- 
tempted using the clinical specimen. Sequencing was performed using a Roche 
454 Genome Sequencer Junior, giving ~150X coverage of the influenza genome 
on average. Original reads from the 454 sequencing were assembled into contigs 
using overlapping regions of 40 nucleotides with >90% identity and assembled 
using Lasergene, version 9.0 (http://www.dnastar.com). Samples containing more 
than one subtype of HA or NA gene, or having two or more copies of the same in- 
ternal gene sharing <97% identity were considered as mixed infections. In all other 
cases, the discordant base calling in the gene was coded with degenerate nucleotide 
characters. 

Sequence alignment and phylogenetic analysis. Nucleotide sequences generated 
in this study were combined with all publicly available sequences of influenza A 
virus sequences available in GenBank (http://www.ncbi.nlm.nih.gov/genbank) and 
GISAID (http://www.gisaid.org) databases (the full list is available from http:// 
dx.doi.org/10.5061/dryad.5q7kf). Sequences were aligned using MUSCLE v3.5” 
with manual adjustments. Sequences with potential mosaic patterns” or an excess- 
ive number of ambiguous bases (>0.5% of the gene length) were excluded from the 
alignments. A smaller subset of reference sequences that were phylogenetically re- 
lated to the H7N9 and our sequences were selected based on the panoramic phy- 
logeny of each gene segment, as described previously’*. These refined data sets were 
then used to estimate maximum likelihood phylogenies using the GTR+T°, nuc- 
leotide substitution model in PhyML v3.0°°. Phylogenetic robustness was evaluated 
using the Shimodaira-Hasegawa approximate likelihood ratio test”’. Large clades 


(denoted as clades 1-3) were identified in the H9N2 HJ-ZJ/07 sub-lineage of each 
maximum likelihood phylogeny of the internal genes. These clades were further 
classified into sub-clades (for example, clades 1.1, 1.2, 1.3, 2.1, etc.) that meet the 
following criteria: (1) they contained at least three H7N9 sequences; (2) their 
average intra-sub-clade genetic distance was less than 3%. The major clade where 
the majority of the first wave H7N9 viruses were found was denoted as clade 1.1. 
Clade numbers were assigned to achieve consistency with an earlier study". 
Phylogenetic inference of spatial and temporal dynamics. The aligned H7 and 
N9 gene sequences of each virus were concatenated (nine strains were omitted 
because their H7 and N9 genes had different clade origins). In addition, the sizes of 
internal gene data sets were reduced by removing H9N2 sequences that are distant 
from the H7N9 sequences, and identical sequences from the same sampling occa- 
sions, so that sophisticated phylogeographic analysis is tractable. Sequences were 
coded with the isolation dates and discrete states that represented the Chinese prov- 
ince of sampling (Shanghai, Jiangsu, Zhejiang, Jiangxi, Guangdong, Shandong, and 
other provinces), subtype population (H7N9 and H9N2) and wave of the outbreak 
(wave 1 and wave 2; for H7N9 only). These data were then used to infer the spatial 
and temporal dynamics of the H7N9 virus transmission using the Bayesian Markov 
chain Monte Carlo (MCMC) method implemented in the BEAST package (version 
1.8) employing the SRD06 nucleotide substitution model”, a relaxed clock model 
with uncorrelated lognormal rate distribution”, a Bayesian skyride coalescent model 
with time-aware smoothing”, anda discrete non-reversible phylogeographic model”*. 
Multiple runs of the MCMC method were computed and combined, giving 3.9~ 
4.5 X 10° total steps for each data set, with sampling every 1,500 steps. Conver- 
gence of relevant parameters was assessed using Tracer v1.5°”. 

Genome mutations. The ancestral nucleotide sequence at each internal node of the 
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27. World Health Organization. WHO information for molecular diagnosis of 
influenza virus - update http://www.who.int/influenza/gisrs_laboratory/ 
molecular_diagnosis/en/ (2014). 

28. Edgar, R.C. MUSCLE: a multiple sequence alignment method with reduced time 
and space complexity. BMC Bioinformatics 5, 113 (2004). 

29. Lam, T. T.-Y. et al. Systematic phylogenetic analysis of influenza A virus reveals 
many novel mosaic genome segments. Infect. Genet. Evol. 18, 367-378 (2013). 

30. Guindon, S., Delsuc, F., Dufayard, J. F. & Gascuel, O. Estimating maximum 
likelihood phylogenies with PhyML. Methods Mol. Biol. 537, 113-137 (2009). 

31. Guindon, S. et a/. New algorithms and methods to estimate maximum-likelihood 
phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307-321 
(2010). 

32. Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by 
sampling trees. BMC Evol. Biol. 7, 214 (2007). 

33. Shapiro, B., Rambaut, A. & Drummond, A. J. Choosing appropriate substitution 
models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol. 
23, 7-9 (2006). 

34. Drummond, A. J., Ho, S. Y., Phillips, M. J. & Rambaut, A. Relaxed phylogenetics and 

dating with confidence. PLoS Biol. 4, e88 (2006). 

35. Minin, V. N., Bloomquist, E. W. & Suchard, M. A. Smooth skyride through a rough 

skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. 

Evol. 25, 1459-1471 (2008). 

36. Lemey, P., Rambaut, A Drummond, A. J. & Suchard, M. A. Bayesian 

phylogeography finds its roots. PLOS Comput. Biol. 5, e1000520 (2009). 

37. Rambaut,A., Suchard, M. & Drummond, A. J. Tracer v1.5 http://tree.bio.ed.ac.uk/ 

software/tracer/ (2007). 

38. Delport, W., Poon, A. F., Frost, S. D. & Kosakovsky Pond, S. L. Datamonkey 2010: a 
suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics 26, 
2455-2457 (2010). 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Internal Internal 


a cl Ne gene clade cite gine ae Cc dite gene clade 
a i 2.4 H 23 i 
HA & NA clades 3 mi71v_4 4 Le | 
wi a co a fp | 
——— 1 H 
me Lt | § —_——— : Satis : 
— we-c 4 97 =—— i E 
5 Clade2 a4 92,01 tho} Clade2 06 tle 24 a 
5 : 1.098 ae 123-1 =} Clade2 
Geographic regions 5 — = SI 
of samples studied = z || 
; 5 3 
4 A 4 
87.98, | z 4 
[ = — z 3 
: a : | 
2 
“ Ss 
1 E 77 == : 3 
TS59N | Sa | Fy 
E191K 39 4 74 5 
WSnO) 2 = Cladet “| Cladet 
Ca 13 4 8 : 141 Clade1 
1570M = os | F 
92 961 J F 4 
oT 8 ee = H 93 = 
: 4 
FA “ 4 
H 5 z 
s 1.2 a 
ea 5 F =] 
7 fae |, 3 =| 
"z a 
Fi 5 4 
i) s 318 = 
M5701 4 3 3 
56 & = Clade3 : | 
7 4 90 10 B Clade3 
@ Human isolate ——— : : Clade3 [rt ree =| 
91 : ; ai) H 
a _—— eee i 
q ————— | 
= q 1.0| “95 —F | 
4 i | 
= i 900s ——_ 4 
BI J H g6| J .-/ - = ot 
oe —————<$ — 
wa 4 SS _———— 
0.006 —$————— = — 
94 SS = 
Internal, 
Internal, 
d Ce ae die oe f ise genes 
: a 2 
5 
D375E = = = 
1 = = = 
Sia : 4 5 
3 3.4 E 
— a | 6 = 
29 
“4 a __ = i & 
a 3 Ez 
_— 2 78) 4 95 E 
88 
=! Clade1 re -j Clade2 a2 4 Clade3 
=a 4 | 
1371M | Hea 4 te 3.2 4 
Sy =] H pS ie 2 
99) Ss 22 H i 
=— = P2125..85f 4 
8) ee = : 
ihe [13 4 a = : 
4 83 H 4 
95 1 _ 4 = 
=] 73 1 : = Clade2 
——— H i 
: = 3 : 
z : 
E = = 
= a) 2 
Fi _ = 4 Hi 
ets =| 
: —- : 
D375E 8 Clade3 Pa ug 2 
371M 1.0 5 = 
99, 5 | 4 
1.0 = : Clade1 = 
M3711 —1377M 5 —— | 55 = 3 4 
is SSS : 
® bares z ; 
0.006 a = = : 8 Clade1 
— SS = 95 5 4 
—————— 88, 4 =] 
SS —aE Se 
89) 96|  |_99- =o |1 2 a 4 
——————eEEe ==f : 
M3711 a i 91 —— 7 3 | 
; = =a =4 
F ——! = 
7 —— 4 Clade2 ——— 4 4 
d 12a = 0.006 oe 4 Clade 4 
1.0,_.95, ] 98 zee 
— = | — i 
Extended Data Figure 1 | Condensed phylogenies for the internal genes. Mutations leading to changes in amino acid usage from wave 1 to wave 2 


a, PB2 (n = 1681), b, PB1 (n = 1620), c, PA (n = 1682), d, NP (n = 1733),e,M (Extended Data Table 4) are shown in blue. Dashed brackets indicate the 

(n = 1696) and f, NS (n = 1707) genes. The H9N2 ZJ-HJ/07 lineage from the —_ major clades 1-3, and vertical lines indicate their sub-clades. The background 
large phylogenies (available from http://dx.doi.org/10.5061/dryad.5q7kf) is shading indicates the provinces from which the viruses were isolated (see inset 
annotated and shown. Red branches indicate the H7N9 viruses, with the map). Human samples are indicated as grey circles. 

remaining branches representing H9N2 (the majority) or other viruses. 
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Extended Data Figure 2 | Prevalence of H7N9 reassortant variants. a, Time- 
line of reassortant variants of H7N9 viruses (n = 505) and human infections. 
Clade 1.1 was the predominant sub-clade in the first wave (see Fig. 2 and 
Extended Data Fig. 1). Symbols represent H7N9 viruses and their time of 
isolation, and the number of non-clade 1.1 internal gene segments (that is, those 
falling outside cladel.1 as defined in the phylogenies; Extended Data Fig. 1) 
in the virus (y-axis). The colours indicate the provinces of isolation of the 


‘ _— 
Jul2014 Time 9 10 20 30 40 50 


% of total sequences 
available in the wave 


Jan 2014 


Wave2 


viruses. Viruses from HA clades W1, W2-A, W2-B and W2-C are indicated 
by triangles, circles, squares and diamonds, respectively. Solid and empty 
symbols represent avian and human viruses. The underlying blocks give the 
number of human infection cases per week (WHO data’, as of July 2014). b, The 
percentage of wave 1 and wave 2 viruses having different numbers of non-clade 
1.1 internal genes (y-axis) in their genomes. 
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Extended Data Table 1 | Surveillance in apparently healthy chickens at live poultry markets 


LETTER 


Province City Time period Samples # Isolates Number of samples positive for 
Influenza H7* (%) H9+ (%) 
Jiangsu Suzhou Oct, 2013 227 103 81 81 (35.7) 
subtotal 227 103 81 81 (35.7) 
Zhejiang Huzhou Oct, 2013 457 144 110 21 (4.6) 88 (19.3) 
Jiaxing Oct, 2013 447 102 68 1 (0.2) 66 (14.8) 
Ningbo Oct, 2013 517 47 36 36 (7.0) 
Shaoxing Oct, 2013 873 235 157 14 (1.6) 142 (16.3) 
Wenzhou Oct, 2013 538 89 60 60 (11.2) 
subtotal 2,832 617 431 36 (1.3) 392 (13.8) 
Shandong — Jinan Oct, 2013 427 63 56 56 (13.1) 
Juxian Oct, 2013 224 22 18 16 (7.1) 
Qingdao Oct, 2013 330 51 23 23 (7.0) 
Rizhao Oct, 2013 615 234 188 187 (30.4) 
Yantai Oct, 2013 309 30 16 16 (5.2) 
subtotal 1,905 400 301 298 (15.6) 
Guangdong Dongguan __ Dec, 2013 1,644 736 501 53 (3.2) 440 (26.8) 
Shenzhen Dec, 2013 2,333 1,017 797 23 (1.0) 755 (32.4) 
Dongguan Feb, 2014 1,543 420 207 133 (8.6) 70 (4.5) 
Shenzhen Feb, 2014 154 16 5 4 (2.6) 
subtotal 5,674 2,189 1,510 209 (3.7) 1,269 (22.4) 
Guangdong Shantou Oct, 2013 82 8 2 2 (2.4) 
Noy, 2013 76 14 11 10 (13.2) 
Dec, 2013 73 20 11 10 (13.7) 
Jan, 2014 101 27 21 20 (19.8) 
Feb, 2014 98 12 3 1 (1.0) 2 (2.0) 
Mar, 2014 115 20 8 8 (7.0) 
Apr, 2014 166 22 7 7 (4.2) 
May, 2014 133 21 13 10 (7.5) 3 (2.3) 
Jun, 2014 154 14 9 2 (1.3) 7 (4.5) 
Jul, 2014 151 30 26 8 (5.3) 18 (11.9) 
subtotal 1,149 188 111 36 (3.1) 72 (6.3) 
Jiangxi Nanchang Oct, 2013 256 15 14 8 (3.1) 
Novy, 2013 436 119 108 105 (24.1) 
Dec, 2013 416 183 181 162 (38.9) 
Jan, 2014 254 124 90 77 (30.3) 
Feb, 2014 1,415 350 103 43 (3.0) 58 (4.1) 
Mar, 2014 312 112 101 48 (15.4) 4 (1.3) 
Apr, 2014 319 99 85 49 (15.4) 35 (11.0) 
May, 2014 329 73 50 38 (11.6) 5 (1.5) 
Jun, 2014 375 55 40 27 (7.2) 12 (3.2) 
Jul, 2014 400 24 16 8 (2.0) 5 (1.3) 
subtotal 4,512 1,154 788 213 (4.7) 471 (10.4) 
Total 16,299 4,651 3,222 494% (3.0) 2,583 (15.8) 


Oropharyngeal swabs collected from each bird are shown. 

* Including mixed infections with other subtypes. 

+ Including mixed infections with other subtypes except H7. 

{All the H7 viruses isolated from chickens were H7N9 except for one H7N3 isolate from Huzhou. 

#Only five H7N9Q isolates were obtained from cloacal swabs; three from 1,412 swabs in Shantou and two from 1,053 swabs in Nanchang. 
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Extended Data Table 2 | Surveillance in apparently healthy ducks at live poultry markets 


Birde Isolates Number of O | C samples positive for 


Province City Time period Sampled Mei jiidenes H7« Hg 
Jiangsu Suzhou Oct, 2013 13 6|2 2|1 
subtotal 13 8 3 
Zhejiang Huzhou Oct, 2013 38 10/10 10 | 10 1|2 |1 
Jiaxing Oct, 2013 36 9/13 7|11 
Ningbo Oct, 2013 38 3| 11 3|9 |2 
Shaoxing Oct, 2013 95 13/16 11| 10 1| |1 
Wenzhou Oct, 2013 35 7|6 4|5 
subtotal 242 98 80 4 6 
Shandong Jinan Oct, 2013 10 | | 
Juxian Oct, 2013 8 3[5 2|5 
Qingdao Oct, 2013 1 | | 
Yantai Oct, 2013 3 | | 
subtotal 22 8 7 
Guangdong Shantou Oct, 2013 84 6|1 5|1 
Nov, 2013 102 5|12 5|9 
Dec, 2013 99 7|7 1|6 
Jan, 2014 118 10|18 5|6 
Feb, 2014 83 10 | 11 2|4 
Mar, 2014 80 2|7 1|3 
Apr, 2014 109 |3 
May, 2014 54 | 1 | 1 
Jun, 2014 83 1|2 1|2 
Jul, 2014 127 3|7 3|6 
subtotal 939 113 61 
Jiangxi Nanchang Oct, 2013 243 | | 
Nov, 2013 405 38 |5 38 | 5 
Dec, 2013 455 27|2 27|2 1| 
Jan, 2014 243 3 | 1| 
Feb, 2014 54 + | 7 |1 
Mar, 2014 324 36 |3 36 | 2 
Apr, 2014 324 4| 4| 
May, 2014 324 66 | 31 | 
Jun, 2014 405 2|1 1|1 
Jul, 2014 243 11 | 1| 
subtotal 3,020 205 150 1 
Total 4,236 432 301 5 6 


Paired oropharyngeal (O) and cloacal (C) swabs were collected from each bird. 
*The three H7 duck isolates from Huzhou were of the H7N7 subtype, while the other two from Shaoxing and Nanchang were of the H7N3 subtype. 
+ Sample size was decreased due to the closure of most markets. 
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Extended Data Table 3 | Genotypes of H7N9Q viruses in this study 


LETTER 


Genotype Total Avian Human 
W2-B|W2-B]3|3]3/3]1/3 102 88 (A/silkie chicken/Shantou/2050/2014) 14 (A/Shenzhen/SP44/2013) 
W2-B|W2-B]3|3]1/3]1/3 838 79 (A/chicken/Shenzhen/727/2013) 9 (A/Shenzhen/SP17/2013) 
W1|W1]1[1]1/1]1/2 61 27 (A/chicken/Guangzhou/27/2013) 34 (A/Anhui/1/2013) 
W2-A|W2-A|2]1]1[1]2]1 49 43 (A/chicken/Fujian/8585/2014) 6 (A/Zhejiang/30/2014) 
W2-A|W2-A|2]2|2]1]2|1 43 43 (A/silkie chicken/Jiangxi/9469/2014) 0 
W2-A|W2-A|2]1]2[1]2|1 34 34 (A/silkie chicken/Shaoxing/5130/2013) 0 

W1|W1]1|2]1]1]1/2 26 19 (A/chicken/Jiangxi/12486/2013) 7 (A/Shanghai/07/2013) 
W1[W1[1]1]1]1]2|1 24 8 (A/environment/Hangzhou/37/2013) 16 (A/Hangzhou/3/2013) 
W2-C|W2-C|2]1]2]1]2]3 13 13 (A/chicken/Jiangxi/18482/2014) ty) 

W1|W1/2|2]2|1]2|1 3 (A/chicken/Shanghai/$1358/2013) 2 (A/Hebei/01/2013) 
W2-B|W2-B|3|3|/2|3]1]3 1 (A/silkie chicken/Dongguan/656/2014) 3 (A/Shenzhen/SP113/2014) 
W2-A|W2-A|2]1]1]1] 1/1 4 (A/chicken/Shaoxing/5087/2013) ty) 
W2-B|W2-B]3|3]1/2]1|3 2 (A/chicken/Shenzhen/742/2013) 1 (A/Shenzhen/SP-Z93/2013) 
W2-C|W2-C|2]1]2|1]2|1 2 (A/chicken/Jiangxi/18515/2014) 1 (A/Taiwan/1/2014) 
W1|W1[1]1]1]1]1]2 3 (A/environment/Guangzhou/77/2013) 0 

W1|W1/2|1]1]1]2|1 1 (A/chicken/Jiangsu/SC537/2013) 2 (A/Shenzhen/SP118/2014) 
W2-A|W2-A|1]1]2[1]2]1 2 (A/chicken/Shaoxing/2417/2013) 0 
W2-B|W2-B]1|3]1/3]1|3 2 (A/chicken/Shantou/4832/2014) ty) 

W1|W2-B|1]1|1]1[1]1 1 (A/environment/Suzhou/14/2013) 1 (A/Beijing/01-A/2013) 
W2-B|W2-B|3|3[3|/3]3]3 1 (A/chicken/Dongguan/4251/2013) 1 (A/Hong_Kong/8122430/2014) 
W2-C|W2-A]2|1]1]1/2]4 2 (A/Duck/Jiangxi/15044/2014) ty) 

W2-C|W2-C|2]1]1]1]2|1 1 (A/chicken/Jiangxi/18008/2014) 1 (A/Taiwan/2/2014) 
W1|W1[1]1]2]3]2|1 2 (A/environment/Shanghai/S1438/2013) (0) 

W1|W1]1|1]2|1]2|2 2 (A/chicken/Hangzhou/48-1/2013) ) 
W2-A|W2-B|2|3{1]1[1]1 1 (A/chicken/Shantou/4824/2014) 0 

W1|W2-B|1]3|1]1[1]1 1 (A/duck/Zhejiang/SC410/2013) 0 
W2-A|W2-A|2]1]1[1]2|2 1 (A/chicken/Shaoxing/5186/2013) 0 
W2-A|W2-A]1]1]1[1]1]1 ty) 1 (A/Zhejiang/22/2013) 
W2-A|W2-B|2|3[2|3[1]3 1 (A/chicken/Shantou/4816/2014) 0 
W2-A|W2-A]2]2|1[1]2]1 1 (A/chicken/Fujian/8829/2014) ty) 
W2-A|W2-A|2]1]1[1]1]2 1 (A/chicken/Shaoxing/5086/2013) 0 


w2-B]W1/3]3]1/3|1]3 


w1]W1 


2|2|1 


W2-B|W2-B]2| 


Ww1]W1 
Ww1]W1 
Ww1]W1 


3|3|1 
1]2|2 
1Jaf1 


j1fa|4 
3/3/3113 
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0 

0 

1 (A/chicken/Dongguan/1124/2014) 

1 (A/environment/Guangzhou/73/2013) 
0 

0 

1 (A/chicken/Shantou/4325/2014) 

1 (A/chicken/Rizhao/871/2013) 

0 

1 (A/environment/Guangzhou/238/2013) 
1 (A/Chicken/Suzhou/097-1/2013) 

1 (A/chicken/Jiangxi/18513/2014) 

1 (A/chicken/Shaoxing/5479/2013) 

1 (A/chicken/Jiangxi/18449/2014) 

1 (A/environment/Shanghai/S1437/2013) 
1 (A/environment/Shanghai/S1439/2013) 
1 (A/chicken/Jiangxi/14513/2014) 


1 (A/Shenzhen/SP49/2013) 
1 (A/Shanghai/13/2013) 

0 

0 

1 (A/Shandong/01/2013) 

1 (A/Shanghai/1/2013) 

0 

0 

1 (A/shanghai/05/2013) 

0 


on on on oe oe oe) 


The count of H7N9 viruses (n = 505) includes sequences from this work as well as publicly available data. Clade designations for each segment are ordered as HA|NA|PB2|PB1 | Pal NP| M | NS. Genotype totals and 
example human and avian strains are shown. 
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Extended Data Table 4 


Amino acid changes from the first to second waves 


Gene Position Wave! viruses Wave2 viruses Functions References 
T (0.7%:1|0) K (0.5% :0|2) 
PB2 191 K (88.3%:83|94) E (99.5%:100|98) 
E (11.0%:15]6) 
559 T (14.5%:17|12) T (99.7%:100|100) 
N (85.5%:83|88) N (0.3%:0]0) 
M (91.0%:88|94) M (12.0%:11|17) 
570 1 (8.3%:12|4) L (0.7%:0|1) 1(87.8%:89|81) 
V (0.3%:0|2) 
PB1 171 M (100.0%:100|100) M (45.8%:48/32) | (0.3%:0|0) 
L (0.3% :0|0) V_(53.7%:52|68) 
397 1(100.0%:100|100) | (45.8%:48|32) 
M (54.2% :52|68) 
525 | (29.0%:39|17) 1(68.3%:67|77) 
V_(71.0%:61|83) V (31.7% :33|23) 
A (87.4%:85|91) A (34.8%:33|50) A100V: Increased 
PA 100 V (12.6% :15|9) V (64.4%:67|43) infection and replication Yamayoshi et al, 2014 
| (0.8%:0|7 in human A549 cells 
304 D (15.4%:19|11) a eo 
N (84.6%:81|89) N (34.8%:33|50) 
M (73.9%:65|85) M (46.4% :48|32) 
NP 371 | (26.1%:35|15) 1(53.3%:51|68) 
T (0.3%:0|0) 
375 D (97.2%:96|98) D (45.6%:47|34) 
E (2.8%:4|2) E (54.4% :53/66) 
M (87.9%:89|87) M (9.8%:8|22) 
NS1 27 K (7.4%:4|11) K (30.2%:34/4) 
L (4.7% :8|1) L (60.1%:58|73) 
S (94.0% :92|96) S (38.9%:41|20) 
80 T (4.0% :6|1) T (60.3%:59|73) 
G (2.0%:1|3) N (0.8%:0|7) 
1(93.3%:89|99) | (39.7%:41|27) 
111 V (6.7%:11|1) V_(60.1%:58]73) 
M (0.3%:0]0) 
152 D (4.0%:6]1) D (60.3%:59|73) 
E (96.0%:94|99) E (39.7%:41|27) 
S (93.3%:89|99) S (39.2% :41|27) loss of 212-PPPPK/R 
212 P (6.0%:10]1) P (60.1%:59|71) motif reduces binding to Heikkinen et al. 2008; 
Y (0.7%:1|0) F (0.5%:1|0) CRKI/II CRKL, and Hrincius et al. 2010 
L (0.3% :0|2) suppression of apoptosis 
T (96.0%:94|99) T (39.4% :41|27) 
216 P (4.0%:6|1) P (60.3%:59|73) As for 212 
K (0.3% :0|0) 
R(79.4%:78|82) R (21.7% :22|16) 
H7 65* K (11.4%:16]4) K (78.3%:78|84) Antigenic site E 


The most frequent amino acid residues are underlined. Frequencies are shown as: amino acid (overall %: % in avian sequences|% in human sequences). For example, K (88.3%:83|94) indicates that lysine is 


M (8.6%:5|14) 
V (0.6%:10) 


present in 88.3% of all sequences, in 83% of avian sequences and in 94% of all human sequences. Human and avian percentages are rounded to integers. 
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Extended Data Table 5 | Amino acid changes from avian to human 
H7N9 viruses 


Protein 


H7 


N9 


PB2 


PB1 


PB1-F2 


PA 


PA-X 


NP 


M1 


M2 


NS1 


NS2 


Amino acid changes (count) 


N208S (5), G552R (4), N551S (3) 


N223D (4), R294K (4), V22A (4) 


E627K (55), D701N (12), I461V (4), Q591K (4), D9Y (3), S534F (3) 


E177D (3), P454L (3), $361N (3), TS66A (3), V2001 (3) 


G70E (3), S77L (3) 


E684G (3), R57Q (3), T263A (3), T618K (3), V1001 (3) 


R57Q (3), V1001 (3) 


R98K (5), A284T (3), I61M (3), R246K (3), V280A (3), V353G (3) 


none 


KQ7E (6) 


R35P (4), R211G (3), S80N (3), V192I (3), V6M (3) 


R42K (3), V6M (3) 


Only changes that occurred in two or more human H7N9 viruses are shown, and were deduced by 
comparing each human viral sequence with the nearest avian viral sequence, based on the phylogenies. 
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A prefrontal-thalamo-hippocampal 
circuit for goal-directed spatial navigation 


Hiroshi T. Ito’, Sheng-Jia Zhang", Menno P. Witter!, Edvard I. Moser! & May-Britt Moser! 


Spatial navigation requires information about the relationship between current and future positions. The activity of 
hippocampal neurons appears to reflect such a relationship, representing not only instantaneous position but also the 
path towards a goal location. However, how the hippocampus obtains information about goal direction is poorly 
understood. Here we report a prefrontal-thalamic neural circuit that is required for hippocampal representation of 
routes or trajectories through the environment. Trajectory-dependent firing was observed in medial prefrontal cortex, 
the nucleus reuniens of the thalamus, and the CA1 region of the hippocampus in rats. Lesioning or optogenetic silencing 
of the nucleus reuniens substantially reduced trajectory-dependent CAI firing. Trajectory-dependent activity was 
almost absent in CA3, which does not receive nucleus reuniens input. The data suggest that projections from medial 
prefrontal cortex, via the nucleus reuniens, are crucial for representation of the future path during goal-directed 
behaviour and point to the thalamus as a key node in networks for long-range communication between cortical 


regions involved in navigation. 


Hippocampal place cells are part of an allocentric representation of 
local space that allows animals to navigate to desired locations’. Place 
cells provide accurate information about current location, but it has 
remained unclear how the place-cell map is used for animals to nav- 
igate from their current position to a goal position elsewhere in the 
environment. To implement goal-directed navigation, previous stud- 
ies have proposed the need for a separate representation of future 
positions that is somehow brought together with the representation 
of current location to point the network to the goal**. Such pointers 
may be expressed in the activity of hippocampal place cells. When rats 
are engaged in a T-maze-based alternation task, in which they take 
left or right trajectories on alternating laps, place cells with fields on 
the stem of the maze fire at different rates on left- and right-turn 
trajectories, without changes in the position of the firing field®’. 
The dependence on trajectory has both retrospective and prospective 
components, reflecting both where the animal comes from and 
where it is going*. However, as the animal approaches the decision 
point at the junction of the maze, the representation becomes more 
forward-oriented’, often with trajectories to upcoming locations 
embedded into the representation’®"', in addition to mere changes 
in firing rate. 

The source of trajectory information in place cells has not been 
identified. Here we used a continuous version of the T-maze alterna- 
tion task® to determine how information about succeeding choices is 
introduced in hippocampal place-cell activity. We hypothesized that 
the selection of future trajectories depends on a wider circuit includ- 
ing not only the hippocampus but also structures involved in the 
evaluation and selection of actions, such as the prefrontal cortex'*™. 
Neurons in medial prefrontal cortex (mPFC) do not project directly to 
the hippocampus’*”* but the midline thalamic nucleus reuniens (NR), 
which has reciprocal anatomical connections with the mPFC, may 
serve as a functional bridge to the hippocampal region, since NR has 
strong terminal fields in the CA1 subfield’*'”-’. To address this pos- 
sibility, we recorded and manipulated activity at various nodes of the 
prefrontal-reuniens—CA1 circuit and determined whether this circuit 
is necessary for place cells to represent upcoming trajectories. 


Trajectory- dependent firing is stronger in CA1 than CA3 
We first asked whether NR is the source of trajectory information in 
CAI. If it is, we should observe a difference in trajectory-dependent 
firing between CA1 and CA3, because NR has major excitatory pro- 
jections to CA1 but not CA3 (refs 15, 17-19). We thus recorded place 
cells in rat CAl and CA3 in a continuous alternation task on a 
modified T-maze (Fig. la, b and Extended Data Fig. 1). A total of 
363 CAI cells and 180 CA3 cells exhibited location-specific complex 
spiking (12 and 5 rats, respectively). Within this sample, 98 CA1 cells 
and 34 CA3 cells had place fields on the central stem. All subsequent 
analysis of these cells was restricted to parts of the stem where 
there was no significant difference in the animal’s head direction, 
lateral position, or running speed between left-turn and right-turn 
trajectories. 

Many place cells in CA1 expressed several-fold changes in peak 
firing rate between left- and right-turn trajectories on the stem, with- 
out changes in the position of the firing field (Fig. 1c, d and Extended 
Data Fig. 2a). 54 CA1 cells (55.1%) showed significant rate changes 
that depended on trajectory (left or right) (P < 0.05 for main effect 
of trial type (left/right) in a two-way analysis of variance (ANOVA) 
with trial type and stem position as factors, and P < 0.05 post-hoc 
analysis of covariance (ANCOVA) with running speed, head dir- 
ection and lateral position as covariates®). By contrast, only six cells 
(17.7%) met the criteria for trajectory-dependent rate change in 
CA3 (Fig. le, f). The proportion of trajectory-dependent cells rela- 
tive to total place cells was significantly smaller in CA3 than in CA1 
(Z = 3.78, P < 0.001, binomial test). Distributions of rate changes 
were significantly different (CA1, 32.8 + 2.6%; CA3, 19.8 + 3.1%; 
means + s.e.m.; D = 0.338, P = 0.005, Kolmogorov-Smirnov test; 
Fig. 1d, f and Extended Data Fig. 2b). Place-fields position did not 
change across trajectories (D = 0.125, P = 0.812). 


Trajectory-dependent firing in NR 

The fact that trajectory dependence is expressed more strongly in CA1 
than in CA3 points to NR as a possible source of modulation. To 
examine whether trajectory information is represented in NR, we 
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Figure 1 | Trajectory-dependent firing in CA1 but not CA3. a, Modified 
T-mazes used for the continuous alternation task. Red disk, food reward; 
arrows, running directions. b, Nissl-stained coronal sections through dorsal 
hippocampus. Red circles, tetrode tracks in CA1 and CA3. Original 
magnification, X2.5. c, Trajectory-dependent firing in CA1. Left panel, rate 
maps for a representative CA1 place cell in the continuous alternation task 
(left to right: all laps, right-turn laps, left-turn laps). Identification number of 
animal (#16755) and unit number (TT11_1.t) are indicated on top. Right panel, 
means (solid lines) and 95% confidence intervals (shaded) for spike rates of 
a single cell across the stem of the maze. Raster plots above. Left-heading runs 
in blue, right-heading runs in red. d, Left panels, difference in mean rates 


recorded spike activity in NR, simultaneously with CA1, while 
animals performed the continuous alternation task. Activity was also 
recorded in a square enclosure. Tetrodes were placed centrally in 
the rostral half of NR, where many CA1-projecting neurons are 
located’* (Fig. 2a and Extended Data Fig. 1). We recorded activity 
in 64 NR cells from six animals. NR neurons were active across the 
entire box, with a mean firing rate of 7.8 + 1.3 Hz (mean + s.e.m.). 
Spatial information in bits per spike was negligible and substantially 
lower than in CA1 (NR, 0.048 + 0.009; CA1, 1.46 + 0.09; Extended 
Data Fig. 2c). Nonetheless, in the continuous alternation task, NR 
neurons exhibited differential firing on left- versus right-turn traject- 
ories (Fig. 2b, c). Of the NR cells 42.2% (27 out of 64) showed a 
significant rate change across alternating trajectories (Fig. 2d; 
P < 0.05 for trial type and trial type < stem position in a two-way 
ANOVA). The proportion of trajectory-modulated cells was not sig- 
nificantly lower than in simultaneously recorded CA1 cells (59.1%; 
13/22 cells; Z = 1.37, P = 0.17, binomial test) or in the entire sample 
of CA1 cells from the animals with hippocampal tetrodes (54/98 cells; 
Z = 1.61, P = 0.11). The mean change in peak firing rate on the stem 
was 22.5 + 2.5% (left versus right; Fig. 2e). The difference between 
left- and right-turn trajectories could not be explained by differences 
in other behavioural variables (P < 0.05, ANCOVA with running 
speed, head direction and lateral position as covariates). The record- 
ings thus support the idea that NR is a major source of trajectory 
information to CA1. 


Trajectory-dependent firing in mPFC 

NR receives strong projections from mPFC”, suggesting that it serves 
as a relay between mPFC and CA1"*”’. If it does, trajectory informa- 
tion in NR and CAI may also be expressed in mPFC. To test this, we 
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between left and right trajectories for all CA1 cells with firing fields on the 
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recorded the activity of mPFC cells in the continuous alternation task 
(338 cells, 3 animals). The cells were also recorded during free foraging 
in the square box. The recordings started with tetrodes in the dorsal 
anterior cingulate cortex and continued as the tetrodes were advanced 
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Figure 2 | Trajectory-dependent firing in NR. a, Nissl-stained coronal 
section showing tetrode track (red circle) in NR (outline). Original 
magnification, 2.5. b, Rate maps of a representative NR cell in the continuous 
alternation task (left to right: all laps, right-turn laps, left-turn laps). Animal 
and unit identification as in Fig. 1b. Top, colour-coded rate maps; bottom, spike 
locations (red) on trajectory (blue). c, Mean rate, 95% confidence intervals 
and raster plots for the cell in b. d, Normalized spike rate on the stem for all cells 
recorded in NR, plotted as in Fig. 1d. e, Change in spike rate between high- 
and low-rate trajectories, as in Fig. 1d but for NR cells. 
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to the dorsal prelimbic cortex (Fig. 3a and Extended Data Fig. 3a). 
Neurons in mPFC had firing properties similar to those of NR neu- 
rons in that, while they were non-selectively active throughout the 
square box, with a mean firing rate of 4.9 + 0.4 Hz (mean = s.e.m.) 
and minimal location-selective activity (mean spatial information: 
0.134 + 0.012 bits per Hz; Extended Data Fig. 2c), these neurons fired 
differentially on left and right-turn trajectories on the central stem of the 
alternation task (Fig. 3b, c). One-third of the cells (129/338 cells or 
38.2%) exhibited trajectory-dependent rate changes, in agreement with 
previous reports'*”° (Fig. 3d and Extended Data Fig. 3b, c; P < 0.05 for 
trial type and trial type X stem position in a two-way ANOVA). The 
mean change in peak firing rate between left and right-turn trajectories 
(+ s.e.m.) was 29.6 + 1.2% (Fig. 3e). The rate change was not caused by 
differences in observed behaviour (P < 0.05 in ANCOVA with running 
speed, head direction and lateral position as covariates). Taken together, 
these observations suggest that NR shares information about past and 
present trajectory with the mPFC. 


NR inactivation reduces trajectory-dependent CAI] firing 


It is not clear from the recording experiments whether trajectory- 
dependent firing in NR is necessary for trajectory-dependent firing 
in the hippocampus, or whether these patterns of activity are 
expressed independently and in parallel across multiple brain regions. 
We addressed this question using two approaches that each inter- 
rupted the mPFC-NR-CA1 loop at the level of NR. 

First we made lesions in NR using local injections of ibotenic acid 
(Fig. 4a and Extended Data Fig. 4). Animals with NR lesions did not 
exhibit detectable deficits in learning or performance on the continu- 
ous alternation task (Extended Data Fig. 5a), in agreement with pre- 
vious work showing that continuous alternation persists after large 
hippocampal lesions”’. We were also not able to identify changes in 
running speed or in the spectral power of local field potentials in CA1 
(Extended Data Fig. 5b, c). However, trajectory coding in CA1 was 
clearly impaired. We recorded 176 CA1 place cells from 4 animals 
with NR lesions. Neurons in CA1 from lesioned animals expressed 
little rate change between left- and right-turn trajectories (Fig. 4b, c). 
Among the 44 cells that had place fields on the stem, we found only 7 
(15.9%) that passed the criteria for trajectory-dependent rate change, 
a significant reduction compared to the proportion in CA1 cells of 
control animals (55.1%; Z = 4.36, P < 0.001, binomial test). The mean 
rate change (+s.e.m.) between left- and right-turn trajectories for 


b #17914 11724 


Right turn 


Left turn 


Ce d High-rate trajectories Low-rate trajectories @ 
D 
a : 0.25 
50 
eh. F 
a * 0.2 
5 100 
> 
15 2 2 
WT £ 150 6 0.15 
z= 2 2 
210 3 200 & o1 
5 250 
~5 0.05 
8 300 H 
0 


% 10 20 30 40 50 
Stem position (cm) 


0 0.20.40.60.8 10 0.20.40.60.8 1 
Normalized stem position 


QO 20 40 60 80 100 
Peak rate change (%) 


Figure 3 | Trajectory-dependent firing in mPFC. a, Nissl-stained coronal 
section showing tetrode positions (red circles) in the dorsal prelimbic area of 
mPFC. Original magnification, X2.5. b, Rate maps for a representative mPFC 
cell (recorded at location in a, plotted as in Fig. 2b). c, Mean rate, 95% confi- 
dence intervals and raster plots for the cell in b. d, Normalized rate on the stem 
for all mPFC cells, as in Fig. 1d. e, Change in spike rate of mPFC cells between 
trajectories, as in Fig. 1d. 


52 | NATURE | VOL 522 | 4 JUNE 2015 


place cells was 18.7 + 2.7%, significantly lower than in CA1 control 
animals (32.8 + 2.6%; D = 0.346, P < 0.001, Kolmogorov-Smirnov 
test; Fig. 4d, e and Extended Data Fig. 5d) and comparable to CA3 
control animals (19.8 + 3.1%; D = 0.152, P = 0.744). The position 
shift of place fields between trajectories was not significantly different 
from those of control animals (D = 0.083, P = 0.982). Thus the NR 
lesion caused a selective reduction of trajectory-dependent rate differ- 
ences in CAI cells. 

The lesion experiment does not exclude the possibility that NR 
plays only a temporary role in the development of trajectory coding 
in CA1 and so may not be required after initial learning. To assess the 
need for ongoing NR activity, we used local infusion of adeno-assoc- 
iated virus to express selectively the enhanced halorhodopsin 
eNpHR3.1 in NR neurons. Neurons expressing eNpHR3.1 could then 
be inactivated optogenetically with laser application using a wave- 
length of 532 nm restricted to the time when the animals were 
engaged in the alternation task. NR spikes, recorded with tetrodes 
attached to the optic fibre, were significantly suppressed during light 
application, on average by 63.1 + 7.0% (mean reduction + s.e.m., 
5 to 0 s interval before silencing versus 2.5 to 5 s interval after onset 
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Figure 4 | Loss of trajectory-dependent firing in CA1 after NR inactivation. 
a, Nissl-stained coronal brain section showing bilateral NR lesion. Outline 
shows NR. Original magnification, 2.5. b, Colour-coded rate maps for a 
representative CA1 place cell in a NR-lesioned animal, plotted as in Fig. 1c. 
c, Mean rate, 95% confidence intervals and raster plots for the cell in 

b. d, Normalized firing rate on the stem for all CA1 place cells from animals 
with NR lesions, as in Fig. 1d. e, Change in peak rate on the stem between left- 
and right-turn trajectories, as in Fig. 1d. f, Left, colour-coded rate maps for a 
representative CA1 place cell before, during, and after optogenetic silencing 
of NR, with separate plots for left- and right-turn trajectories, as in Fig. 1c. 
Middle, means, 95% confidence intervals and raster plots for left- and right- 
turn trajectories (blue and red, respectively). Same cell as in the left plot. 
Right, change in peak rate between left- and right-turn trajectories for CA1 
place cells with trajectory-dependent firing (as in Fig. 1d). 
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of silencing; Extended Data Fig. 6). In six NR-implanted animals, we 
recorded simultaneously the activity of 50 CA1 cells with place fields 
on the stem. Before the laser application, 72% of the recorded CA1 
place cells (36/50) exhibited significant trajectory-dependent rate 
changes. During the laser application, the percentage of trajectory- 
dependent cells, in the same sample, was reduced to 44 (22/50, 
Z = 2.837, P = 0.005, binomial test). When the illumination was 
terminated, the percentage recovered to baseline (72%; 36/50 cells). 
We did not observe any difference in the position of place fields 
between trajectories on laser-on and laser-off trials (Fr9, = 0.02, 
P = 0.983, repeated-measures ANOVA; Extended Data Fig. 5i). 
Silencing of NR cells did not affect the animal’s behaviour, the mean 
firing rates of the place cells, or the spectral power of the local field 
potential in CA1 (Extended Data Fig. 5e-h).Taken together, these 
findings demonstrate that NR activity modulates firing rates of CA1 
cells in a trajectory-dependent manner during spatial navigation. 


Prospective trajectory representation 


While we found significant correlation between trajectory choices 
and activity of neurons in mPFC, NR and CA1 (Extended Data 
Fig. 7), it remains unclear whether this information is a determinant 
of the next trajectory choice or merely a reflection of events associated 
with the preceding lap on the maze. We addressed this distinction in 
three ways. 

First, we investigated trajectory-dependent firing on error trials, or 
runs succeeded by an incorrect choice. On these runs, the representa- 
tion of the next correct destination is likely to be compromised, unlike 
influences from the preceding trajectory, which may be preserved. 
Differences between correct trials and error trials are likely to be most 
evident near the end of the stem, just before the animal makes the next 
trajectory choice. Thus, we divided the stem into equal-size bins and 
assessed decoding accuracy using mean firing rates in each bin as 
inputs to a classifier. We found that the activity of neurons in 
mPFC, NR and CAI consistently represented the correct next traject- 
ory across stem positions on correct trials (Fig. 5a). By contrast, on 
error trials, the activity of the neuronal ensemble initially represented 
the correct succeeding trajectory but then gradually decreased to 
chance level as the animals approached the junction. A significant 
reduction of trajectory representation on error trials was observed in 
all three regions—mPFC, NR and CA1—providing further support 
for the idea that these areas are functionally coupled (main effect of 
task performance (correct versus error) in a logistic regression ana- 
lysis, with task performance and stem position as coefficients: mPFC, 
Z = 5.98, P < 0.001; NR, Z = 3.59, P < 0.001; CAI, Z = 3.43, 
P< 0.001). The disruption of trajectory representation on error trials 
indicates that the information transferred through the mPFC-NR- 
CAI circuit is an important determinant of the animal’s succeeding 
choice of trajectory. 

In a second approach, we introduced a delay of 10-15 s each time 
the animal reached the base of the stem. During this delay, access to 
the central stem was blocked. The delay was added to reduce the 
influence of working memory, or memory of the preceding lap, on 
trajectory-dependent firing when the rat subsequently ran down the 
stem of the maze. Prospective components should not be disrupted by 
this procedure. Animals learned the delayed alternation task to near- 
perfect levels (89% correct on average across 56 sessions). We 
recorded 133 cells from mPFC, 57 cells from NR, and 45 cells from 
CAI in this task. All CA1 cells had firing fields on the stem. The 
decoding approach was then used to assess whether the firing rates 
on the stem represented the correct succeeding trajectory. During the 
delay period, the classifier was not able to decode any trajectory rep- 
resentation after the first half of the delay (Fig. 5b). After the delay, 
significant differences emerged between correct and incorrect trials, 
with decoding performance increasing towards the end of the stem 
(task performance X stem position, logistic regression analysis; mPFC, 
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Figure 5 | Prospective coding. a, Decoding of correct subsequent trajectory 
using mean firing rates on the stem as inputs to a linear classifier. The stem was 
divided into six equally sized bins (upper left panel) and decoding performance 
was compared bin by bin for trials when animals subsequently made correct 
versus incorrect choices (three remaining panels: mPFC, NR, CA1; means + 
s.e.m.). Decoding performance was estimated against the next correct trajectory 
direction on both correct trials and error trials. Dashed lines at the top indicate 
bins with decoding performance significantly better than chance (P < 0.05, 
binomial test). b, Decoding of correct subsequent trajectory on trials with a 
10-15 s delay at the start of the stem. Symbols as in a. c, Retrospective and 
prospective components were extracted from spike rates using a subsampling 
procedure that cancelled out the contributions from one of the two 
components. Symbols as in a, but P values were estimated from the bootstrap 
distributions. d, Retrospective and prospective components in the delayed 
alternation task. Symbols as in c. 
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Z = 3.01, P = 0.002; NR, Z = 3.30, P < 0.001; CA1, Z = 3.14, 
P = 0.002). The gradual emergence of a trajectory representation rel- 
evant to the succeeding behavioural outcome suggests that activity in 
the mPFC-NR-CA 1 loop represents prospective trajectory choices and 
not, in the first place, memories of the preceding path. 

Finally, we tried to extract analytically the prospective and ret- 
rospective components of the trajectory representation in the decod- 
ing analysis. To this end, we used a subsampled data set with the same 
number of correct trials and error trials such that the contributions of 
prospective or retrospective components were cancelled out (see 
Methods). In mPFC, the percentage of successfully decoded prospect- 
ive paths was 62.7 + 3.6% on the continuous task and 65.6 + 4.1% on 
the delay task (mean + s.e.m. for last half of the stem). The prospect- 
ive component correlated strongly with position segment, suggesting 
that it built up towards the end of the stem (continuous task, r = 0.73 
+ 0.26; delay task, r = 0.79 + 0.14; Fig. 5c, d). Neurons in NR and 
CAI expressed a similar increase of the prospective component (cor- 
relation between stem segment and prospective component: NR, r = 
0.88 + 0.12; CA1, r= 0.58 + 0.29 in the continuous task; NR, r = 0.73 
+ 0.13; CA1, r = 0.67 + 0.14 in the delay task). The prospective 
component in CA1 was abolished by lesions of NR (Extended Data 
Fig. 7b). We were also able to extract a retrospective component on 
the initial part of the stem in the continuous task (Fig. 5c), but not in 
the delayed task (Fig. 5d). Thus, whereas the representation of the 
preceding trajectory was largely disrupted by increasing the demands 
on working memory, the prospective component was maintained and 
expressed consistently across the mPFC-NR-CAI circuit. 


Discussion 


When animals plan a route to a desired location, they must estimate 
how spatial position is changed following particular movements. Our 
study points to mPFC, NR and CA1 as part of the neural circuit for 
representation of goal-directed routes or trajectories. The data suggest 
that while distinct sets of CA1 cells are activated at each spatial posi- 
tion, the distribution of firing rates among these cells collectively 
represents the animal’s intended direction of movement, and that this 
information is carried from the prefrontal cortex to CA1 through the 
midline thalamic NR. At each node of this loop, cells have firing rates 
that reflect the animal’s subsequent trajectory. Disrupting the loop at 
the level of NR substantially reduces the trajectory dependence of the 
representation in CA1. CA3 cells, which do not receive direct input 
from NR, exhibit little trajectory-dependent activity, despite the 
strong remapping seen in this subfield during changes in the sensory 
environment ~*. Taken together, the results point to the mPFC-NR- 
CAI circuit, and possibly indirect projections from mPFC and NR via 
the entorhinal cortex”’, as a key element of the circuit for map-based 
route planning. The data provide functional support for the idea that 
communication between cortical regions is mediated not only by 
direct connections but also through the thalamus**”’. 

The findings offer some clues as to what kind of information is 
imposed on CA1 cells by signals from mPFC and NR. Previous work 
has pointed to a role for NR inputs in expression of hippocampal 
memory. Lesions of NR disrupt spatial working memory” and inac- 
tivation of mPFC inputs to NR or NR inputs to the CA1 impair 
discrimination between contexts in a fear conditioning task’*. The 
present results speak against a role for mPFC and NR in sensory 
context discrimination per se, because cells in these areas do not fire 
differentially unless the task involves differences in the route taken by 
the animal (Extended Data Figs 8 and 9). The trajectory-dependent 
nature of the firing was also not dependent on working memory, or 
memory of the preceding trajectory, because differential firing was 
resumed on the stem after it was blocked during the delay at the start 
of the stem. Instead the gradual increase in trajectory dependence as 
the animal approached the choice point on correct trials but not on 
error trials points to mPFC and NR as sources for information about 
the animal’s intended movement. The findings provide a possible 
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source for goal-directed trajectory sequences in CAI place cells, 
observed as sweeps of prospective spatial firing both during theta activ- 
ity at junctions in a complex T-maze’® or during brief periods of 
immobility when animals navigate to fixed locations in an open space". 

The fact that alternation performance was not impaired by NR 
lesions, and remains intact after hippocampal lesions”’, raises ques- 
tions about the function of trajectory-dependent firing. We have 
shown that trajectory-dependent firing exists in multiple brain cir- 
cuits. Trajectory information from mPFC may reach systems involved 
in motor planning and decision making directly, without passing 
through the hippocampus. This may be sufficient to enable choice 
behaviour in a simple alternation task. The copy of the trajectory signal 
that is sent to the hippocampus, via the NR, may become critical only 
when navigational decisions require combinatorial representation of 
trajectory and location (Extended Data Fig. 10). Such combinatorial 
representations were observed only in CA1. Nonlinear combination of 
information modalities has been described in individual neurons in a 
number of brain systems*'"** and is thought to increase the discrimina- 
tion capacity of downstream neurons during encoding of high-dimen- 
sional information*’. In the hippocampus, combinatorial coding in 
trajectory-dependent place cells may form the basis for complex nav- 
igational operations in efferent regions such as the subiculum or the 
entorhinal cortex. High-dimensional representations in trajectory- 
dependent place cells may be necessary for networks in these regions 
to classify complex position-trajectory combinations. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Subjects. Thirty-six male Long Evans rats (400-600 g at implantation) were 
housed individually in transparent Plexiglass cages (45 cm X 30 cm X 35 cm). 
Six of the rats were implanted with tetrodes in CA1 only. Two rats were implanted 
with tetrodes in both CA1 and CA3. Three rats had tetrodes in CA3. Four rats 
received neurotoxic lesions of NR and had tetrodes in CA1. Twelve rats had 
tetrodes in NR and CA1/CA3. Three rats had tetrodes in mPFC. Six rats received 
adeno-associated virus (AAV) injections in NR; all of these animals had tetrodes 
in CA1. All rats were kept at 85-90% of free-feeding body weight and maintained 
ona 12-h light/12-h dark schedule. All behavioural training and recordings were 
performed in the dark phase. 

The experiments were performed in accordance with the Norwegian Animal 

Welfare Act and the European Convention for the Protection of Vertebrate 
Animals used for Experimental and Other Scientific Purposes. The study con- 
tained no randomization to experimental treatments and no blinding. Sample 
size (number of animals) was set a priori to three or more, considered as the 
minimum required to obtain the number of cells required for statistical power 
in the present type of data. No statistical method was used to predetermine 
sample size. 
Construction, preparation and titration of recombinant AAV (rAAV) expres- 
sing eNpHR3.1. The proviral plasmid used for packaging rAAV was flanked 
by AAV serotype-2 inverted terminal repeats (ITR). The rAAV vector contained 
both a woodchuck hepatitis virus post-transcriptional regulatory element 
and a bovine growth hormone polyadenylation signal for enhancing transgene 
transcription and expression. Transcription was regulated by a calcium- 
calmodulin-dependent protein kinase II « (CaMKII) promoter, and the viral 
vector, pAAV-CaMKIIa.-eNpHR3.0-eYFP (a gift from K. Deisseroth), was used 
as a PCR template to generate a trafficking-enhanced opsin. A Flag tag was placed 
at the C-terminus of the opsin gene between the 20-amino-acid trafficking signal 
DYKDHDGDYKDHDIDYKDDDDK and the endoplasmic reticulum exporting 
motif FCYENEV, both derived from the inward-rectifier potassium ion channel 
Kir2.1 and introduced to improve membrane trafficking*’. The 17-amino acid 
N-terminal signal peptide from the B subunit of the nicotinic acetylcholine recep- 
tor, originally used for membrane insertion in eNpHR2.0, was removed as prev- 
iously described*”**. 

The rAAV vector was pseudo-typed with AAV] capsid proteins. rAAV2/1 was 

prepared by co-transfection of human embryonic kidney cell line HEK293 using 
the calcium phosphate method along with the adenoviral helper plasmid pHelper 
(Strategene). Twelve hours after transfection, the DNA/CaCl, mixture was 
replaced with normal growth medium. After an additional 60 h in culture, the 
transfected cells were collected and subjected to three freeze/thaw cycles. The 
clear supernatant was then purified using heparin affinity columns (HiTrap 
Heparin HP, GE Healthcare). The purified rAAV2/1 was concentrated with an 
Amicon Ultra-4 centrifugal filter 100K device (Millipore), and the viral titre 
was determined by real-time quantitative PCR using StepOnePlus Real-Time 
PCR Systems and TaqMan Universal Master Mix (Applied Biosystems). The 
titred virus was diluted and matched to 1.0 X 10'? viral genomic particles per 
ml by 1X PBS. 
Surgery, virus injection, lesions and drive implantation. The rats were anaes- 
thetized with isoflurane. Initial concentration in the induction chamber was 5.0% 
(vol/vol). Air flow was set to 1.0-1.5 1 min’ '. For analgesia, Temgesic (bupre- 
norphine, 15 1g/300 g; RB Pharmaceuticals Limited) was administered by sub- 
cutaneous injection. Following induction of anaesthesia, the animal was fixed in a 
Kopf stereotaxic frame for electrode implantation and virus injection at 0.5-2% 
isoflurane (vol/vol), adjusted according to physiological monitoring. Holes for 
tetrode implantation were drilled on the skull. 

For tetrode recording from CA1 or CA3, animals were implanted with a 
‘hyperdrive’ with 14 independently movable tetrodes constructed from 17-1m 
polyimide-coated platinum-iridium (90-10%) wire (California Fine Wire). The 
tetrode bundle was circular. The tetrodes were implanted at anterior—posterior 
(AP): —3.8 mm from bregma, medial-lateral (ML): 3.5 mm from midline, and 
dorsal-ventral (DV): 1.0 mm below dura. Electrode tips were plated with plat- 
inum to reduce electrode impedances to 100-200 kQ at 1 kHz. In seven animals, 
implanted for simultaneous recording from NR and CA1 and/or CA3, we used a 
split bundle of tetrodes, in order to independently target seven tetrodes (three 
independently movable double tetrodes and one reference) to NR (AP: —2.25, 
ML: 0.6) and seven tetrodes (six independently movable tetrodes and one ref- 
erence) to the hippocampus (AP: —3.25, ML: 2.5). The tetrodes were implanted 
with a 5° lateral-to-medial angle in the coronal plane. For tetrode recording in 
mPFC, a hyperdrive with a circular bundle of 14 independently movable tetrodes 
was implanted on the surface of the prefrontal cortex (AP: +3.25, ML: 0.6, DV: 
1.0, with a 5° lateral-to-medial angle in the coronal plane). The hyperdrives were 
secured to the skull with jeweller’s screws and dental cement. Two screws in the 


skull behind the lambda (above the cerebellum) were connected to hyperdrive 
ground. Following closure of the wound, the electrodes were turned into the 
cortex while signals were monitored on the recording system. The animals 
received an oral dose of the analgesic Metacam (Meloxicam, 0.1 mg per 300 g; 
Boehringer Ingelheim) during the first few days after the surgery. 

In two animals aimed for simultaneous recording from NR and CA1, we first 
implanted a ‘microdrive’ with four tetrodes targeting NR (AP: —2.0, ML: 0.6, DV: 
5.5, with a 5° lateral-to-medial angle in the coronal plane). This was followed 
by the implantation of a second microdrive above CA1 (AP: —4.0, ML: 3.2, 
DV: 1.5). One skull screw behind lambda (above the cerebellum) served as ground 
for each drive. 

For the optogenetics experiments, solution of rAAV virus was injected using a 
10-l NanoFil syringe and a 33-gauge bevelled metal needle (World Precision 
Instruments) at four sites in NR (AP: —2.0 and —2.5, ML 0.8 mm from midline, 
DV: 6.75 and 6.25). The injection was made at an 8° lateral-to-medial angle in the 
coronal plane) in order to target the central portion of NR in the coronal plane, on 
both sides of the midline. Injection volume (0.25 il at each site) and flow rate 
(0.05 pl min~') were controlled with a Micro4 Microsyringe Pump Controller 
(World Precision Instruments). After the injection, the needle was left in place for 
ten additional minutes before it was withdrawn slowly. After retraction of the 
needle, an optic fibre (FT400UMT: 0.39 NA, core size © 400 tum; Thorlabs) with 
two tetrodes attached was inserted so that the tip of the fibre was approximately 
0.25 mm above NR (AP: 2.25, ML: 0.8, DV: 6, with an 8° lateral-to-medial angle in 
the coronal plane). The two tetrodes were advanced 0.75 mm beyond the tip 
of the optic fibre, targeting NR. The tetrodes were wired to the headstage con- 
nector for the recording system (Axona Ltd). After the optic fibre insertion, a 
hyperdrive with 14 independently movable tetrodes was implanted above CA1 in 
the left hemisphere (AP 4.0, ML 3.5 with a 10° lateral-to-medial angle in the 
coronal plane). 

For NR lesion experiments, ibotenic acid (Sigma-Aldrich) was dissolved in 

phosphate-buffered saline (pH 7.4, 10 mg ml‘) and injected using a 10-1 
NanoFil syringe and a 33-gauge bevelled metal needle (World Precision 
Instruments) mounted to the stereotaxic frame. Volumes of 0.1 pl of ibotenic 
acid were infused over 10 min at three stereotaxic positions in the NR of the left 
hemisphere (AP: — 1.75, ML: 0.6, DV: 6.75; AP: 2.25 and 2.75, ML: 0.6, DV: 7.0), 
targeting the central portion of NR in the coronal plane. The angle of the injection 
needle was 5° in the coronal plane with the tip pointing towards the midline. The 
flow rate was 0.01 pl min™'. Flow was controlled with a Micro4 Microsyringe 
Pump Controller. After the injection, the needle was left in place for 10 min. 
When the infusions were completed, the rats were immediately implanted with a 
hyperdrive aimed at CA1 in the same hemisphere (AP: 3.8, ML: 3.0) or with two 
microdrives aimed at CA1 in each hemisphere (AP: 4.0, ML: 3.2). 
Electrode turning and recording procedures. The hyperdrive was connected to 
a multichannel unity gain headstage (HS-54; Neuralynx). The output of the 
headstage was connected via a lightweight multi-wire tether and a Neuralynx 
PSR-36 commutator to a data acquisition system with 64-channel digital ampli- 
fiers (Digital Lynx; Neuralynx). Unit activity was filtered at 600 (64 taps)-6,000 
(32 taps) Hz with a FIR band-pass filter. Spike waveforms above a threshold of 
~40 wV or more (noise r.m.s. <20 pV) were time-stamped and digitized at 
32,556 Hz at 24-bit resolution for 1 ms. Light-emitting diodes on the headstage 
were tracked to obtain the animal’s position and head direction. The local field 
potential (LFP) was filtered at 1-500 Hz with a running average filter (DCO) and 
a low-pass FIR filter (64 taps). The LFP signal was digitized at 2,034 Hz. 

After surgery, the tetrodes were moved in small daily increments towards the 
target area while the rat was resting on a pedestal. One electrode was used to 
record a reference signal from the superficial layers of the cortex (DV: ~1 mm). 
Another electrode was used to monitor LFP, in the stratum lacunosum-molecu- 
lare for the recordings from the hippocampus, and in the dorsal thalamus (DV: 
~5 mm) for recordings from NR. The pyramidal cell layer of CA1 or CA3 was 
identified during recording by the presence of sharp waves and large-amplitude 
complex-spike activity. On the day of recording, the electrodes were not moved at 
all to maintain stable recordings. 

Microdrives for simultaneous NR and CA1 recording were connected to a 
multi-channel unity gain headstage, which in turn was connected via a counter- 
balanced cable to an Axona recording system (Axona Ltd). Unit activity was 
band-pass filtered at 600-6,000 Hz with third-order Bessel filters and amplified 
by a factor of 5,000-12,500. Spike waveforms above a threshold of ~40 LV or 
more (noise r.m.s. <25 LV) were time-stamped and digitized at 48,000 Hz at 
24-bit resolution for 1 ms. Tetrodes were lowered in 50-j1m steps while the rat 
rested on the pedestal. The LFP was low-pass filtered at 500 Hz with a sixth-order 
Bessel filter. The signal was digitized at 4,800 Hz. 

For tetrode recordings with optogenetic manipulations, a 532-nm light pulse 
was generated from a DPSS laser unit (Shanghai Laser & Optics Century) with a 
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patch cable (FT400UMT; Thorlabs) connected to the animal. Power density was 
20-30 mW mm at the tip of the fibre. The laser application was controlled with 
a custom made program in MATLAB (MathWorks) through a NI-DAQ system 
(USB-6211; National Instruments). Pulse delivery depended on the animal’s 
position on the maze, which was monitored through a NetCom connection 
between MATLAB and Cheetah recording software (Neuralynx). Three to four 
weeks after the virus injection, silencing of cells with laser application was con- 
firmed with tetrodes attached to the optic fibre. Unit activity from the tetrodes on 
the optic fibre was monitored using an Axona recording system (Axona Ltd). In 
all animals tested, at least two units in NR showed a significant reduction of spike 
rate by the laser application (Extended Data Fig. 6). After the animals were 
sufficiently familiarized with the continuous alternation task, the first recording 
session (~10 min) started with an optic-fibre patch cable connected to the animal 
without laser application. After the first session, the animal was at rest for 5 min 
before the next session, when light was applied for ~10 min. To avoid unneces- 
sary photodamage to the tissue, the laser application was turned off intermit- 
tently. The laser application was always on when the animal was running on the 
central stem and the side arms, but it was turned off when the animal reached the 
bottom arm. Five seconds after the animal reached the food port, the laser 
application was restarted, which was approximately 5-10 s before the next run. 
The laser then continued to be on for the next trajectory. After the session with 
laser application, the animal was at rest on the pedestal for at least 10 min or in the 
Plexiglass home cage for ~30 min before a new session was started. The final 
session (~10 min) was conducted with the optic fibre patch cable connected 
without laser application. 

Behavioural task on the modified T-maze. Two versions of modified T-mazes 
are shown in Fig. 1a (110 cm X 110 cm square-shaped maze and 130 cm X 130 cm 
diamond-shaped maze). The mazes were constructed of 12 cm wide wooden 
runways covered by rubber sheet and with 2 cm high plastic side walls. The central 
runway (stem) was 100 cm for the square-shaped maze and 120 cm for the 
diamond-shape maze. Additional wall strips (10 cm length, 2 cm high, 1 cm 
thickness) were added on both sides at the end portion of the stem to reduce 
the width of the runway. This helped minimizing the lateral deviation of the 
animals’ trajectories. Chocolate-taste cereals or cookies were provided on a small 
dish located at the centre of the bottom arm (Fig. 1a). In contrast to previous 
studies using this task°*’, reward was always given at the same spatial position, 
irrespective of whether the animals chose left or right trajectories, such that effects 
of intended movement could be dissociated from effects of the goal location itself. 
The maze was elevated 50 cm above the ground. It was surrounded by black 
circular curtains (180 cm diameter) without any visual cues on three sides. The 
bottom side of the maze was partially open to the recoding room. 

Behavioural training and recordings were performed on one of the mazes. The 
maze was randomly chosen for each animal. After finishing the recordings on the 
first maze, some animals with tetrodes in NR or in the hippocampus were further 
trained and recorded on the other maze. For NR recordings, the same units were 
typically active across mazes. In those cases, cells were only included into one of 
the data sets. For recordings from CA1 or CA3, we often observed global remap- 
ping of place cells after changing the shape of the maze, and sometimes new units 
were recruited on the stem. 

Behavioural training started after recovery from surgery. Training started with 
1 or 2 days of accommodation where each rat was placed on the maze to freely 
explore and find food at the food port. In the next stage of training, the animals 
were instructed to follow a specific direction on the maze—from the stem through 
a side arm to the food port—by blocking reverse movement with the experimen- 
ter’s hands when necessary. Food was available at the food port irrespective of 
which trajectory the animal chose at this stage. After the animal was familiar with 
the movement direction rule on the maze, the final stage was to acquire the 
alternation rule. Reward was provided only when the animal chose the opposite 
trajectory of the previous trial, irrespective of whether choices were correct or 
incorrect on the preceding trial. For each day, three to five 10-min sessions were 
performed. Criterion was reached when choices were correct on 90% of the trials. 
Trajectory-dependent firing continued to be expressed long after the animals 
reached the behavioural criterion (for up to 2-3 months). Trajectory dependence 
emerged without the use of a barrier to instruct correct alternation during the 
training stage”. 

The number of left- and right-preferring neurons (neurons with higher firing 
rate on trajectories that led to left versus right turns) was balanced (CA1: 48 versus 
50 cells; CA3: 16 versus 18 cells; CA1 with NR lesions: 24 versus 20 cells; NR: 34 
versus 30 cells; mPFC: 176 versus 162 cells). There was no difference in the 
number of left- and right-preferring cells within trials (binomial tests with 
Bonferroni correction, P > 0.05). 

For the delayed alternation task in Fig. 5b, a delay period of either 10 or 15 s was 
introduced before the animals started running on the stem. The maze was 
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equipped with a manually controlled plastic door (25 X 25 cm) on the central 
stem approximately 25 cm from the start of the stem. In addition, plastic walls 
(25 cm high) were inserted on both sides of the delay zone to minimize lateral 
movement (Extended Data Fig. 7c). The animal’s movement was continuously 
monitored with a custom made program in MATLAB with a NetCom connection 
to the recording system. When the animal entered the delay zone, a counter was 
started by the program. Criterion was reached when choices were correct on 80% 
of the trials. 

Spike sorting and cell classification. All main analyses were performed using 
MATLAB (MathWorks). Spike sorting was performed offline using MATLAB- 
based graphical cluster-cutting software, MClust (A.D. Redish). Clustering was 
performed manually in two-dimensional projections of the multidimensional 
parameter space (consisting of waveform energies and peak-trough amplitude 
differences). Autocorrelation and cross-correlation functions were used as addi- 
tional separation tools. For recordings in CA1 or CA3, putative pyramidal cells 
were distinguished from putative interneurons by spike width and average rate 
and the presence of bursts. In the continuous alternation task, only cells with a 
peak firing rate more than 1 Hz on the central stem of the maze on either 
trajectory were analysed. In the open field, all units with an average firing rate 
above 0.2 Hz in at least one of the sessions were used for the further analysis. 

To ensure the same cell was not counted multiple times, for recordings in CA1 

and CA3, the estimated number of cells recorded on each tetrode was generally 
based on a single recording session (with tetrodes placed optimally in the cell 
layer). For a few exceptional animals, a second recording session was conducted 
in the other shape of the T-maze, but in these cases, only new clusters at the same 
tetrode position were included. For recordings from NR and mPFC, discrete units 
were sampled from recording sessions with at least 40 um separation from the 
preceding and succeeding recording locations. 
Trajectory-dependent firing on the modified T-maze. For analysis of traject- 
ory-dependent firing on the stem, we first extracted a portion of the stem where 
the animal’s running speed, head direction and lateral position were not signifi- 
cantly different between left and right trajectories. 95% confidence intervals for 
multiple comparisons of six bins (with Bonferroni correction) were determined 
for lateral position on left- and right-turn trajectories and the portion of the 
central stem with overlapping confidence intervals was extracted for analysis. 
A segment of 5 cm was further excluded from the top end of the extracted stem 
portion to guarantee minimal trajectory deviation. For the remaining portion of 
the stem, we examined the 95% confidence intervals for running speed on left- 
and right-turn trajectories. If necessary, the trial with the largest deviation of 
running speed was excluded iteratively until the confidence intervals between 
trajectories overlapped across the entire selected portion of the stem. The same 
procedure was applied for head direction. 

To analyse trajectory-dependent firing on the central stem of the T-maze, we 
divided it into six equally sized bins. The length of individual bins was 8-13 cm, 
depending on the selected portion of the stem. The following parameters were 
calculated for each bin of each trial: (1) firing rate: the number of spikes divided by 
the amount of time spent in the bin; (2) running speed: the averaged position shift 
per time in the bin; (3) head direction: the averaged angle of two coloured LEDs 
on the headstage; and (4) lateral position: averaged position perpendicular to the 
long axis of the central stem. 

For each cell, a two-way ANOVA was conducted with trial type (correct left- 
and right-turn run) and six bins as independent variables and firing rate as the 
dependent measure. In the hippocampus, cells with a significant main effect of 
trial type were identified as potential trajectory-dependent cells. For these cells, a 
second analysis was performed to examine whether variations in speed, heading, 
or lateral position might account for the differences in firing rate between trial 
types. This was examined with a two-way ANCOVA with trial type and bins as 
the independent variables, firing rate as the dependent measure, and speed, head 
direction, and lateral position as covariates. Cells that continued to show a sig- 
nificant difference in firing rate between left- and right-turn trials, when the 
covariates were included in the ANCOVA model, were classified as trajectory 
dependent’. In NR or mPFC, any cell which showed either a significant main 
effect of trial type or a significant trial type < bin interaction with both ANOVA 
and ANCOCA, was considered a trajectory-dependent cell. 

To create spatial rate maps, spatial positions in the maze were divided into 
10 X 10 pixel bins (3 pixels per cm) and the firing rate for each bin was calculated. 
This was performed only for periods when the animal’s running speed exceeded 
10cms *. Instantaneous spike rates were estimated using a Gaussian kernel on 
the spike data for temporal smoothing. Instantaneous rate was calculated as 
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where gis a 1D Gaussian kernel, h is a bandwidth, N is the total number of spikes, 
and t; is the time of the i-th spike. An optimal bandwidth between 50 and 250 ms 
was determined for each cell by minimizing the mean integrated square error 
between the estimated rate and the unknown underlying rate. The rate map was 
smoothed using a 2D Gaussian filter with a bandwidth of one bin (3.3 cm X 3.3 
cm). Bins visited less than 40 ms were excluded. Spike rate at each stem position 
was estimated using a linear interpolation method applied to temporally 
smoothed spike rates by the 1D Gaussian kernel. Mean values and 95% confid- 
ence intervals of the spike rates were calculated for left- and right- turn traject- 
ories. Peak firing rate and peak firing position were determined. 

Decoding analysis. A linear decoder, expressed by the following equation, was 
used to predict the next trajectory from the spike rates on the stem: 


y=b+w, xX Fi +w2 x Fy +3 x F;....=b+w'F 


F=[F,, Fy, Fs,...]', w=[w1, wo, w3,-] 


Fis a vector of firing rates for each cell, wis a vector of respective weights, b is a 
scalar offset, and y is an output value of the classifier (1, — 1). Optimal weights of 
the decoder were determined by a support vector machine algorithm to maximize 
the separation margin for better generalization performance on any data set by 
avoiding over-fitting to the training data used for the weight optimization**”. In 
brief, for a given number N of trials of rate-trajectory pairs, (F; y;), i = 1,2,3,...N, 
we searched for w that satisfies the following condition, 


N 

_1lio- ‘ 
min-=w w+C j 
w,b,é 2 ai yi 


i=1 


subject to y(b+ w' F))>1—é;, €;,>0 


C is a penalty parameter for misclassification. We set C at 1 throughout the 
decoding analysis but changing the C value to (0.1, 10, 100) did not significantly 
affect any conclusion. The mean firing rates on each bin were used as for the 
inputs of the classifier. 

Decoding performance was estimated using a leave-one-out cross-validation, 
performed as follows. From a given set of trials in a recording session, one trial 
was randomly chosen as a test data set and the rest of the trials were used as a 
training data set. The weights of the linear classifier were optimized based on a 
training data set, and the same weights were applied to the test data for classifica- 
tion. This procedure was repeated for all trials to be tested, and the classification 
accuracy on the test data sets was considered an estimate of the decoding per- 
formance. The decoding performance on error trials was calculated using the 
weights optimized for all correct trials in the same recording session. 

For decoding analysis in the delayed alternation task in Fig. 5b, the delay period 
was divided into two temporal bins, corresponding to the first and last halves, and 
the stem part was divided into four equally sized spatial bins. Decoding perform- 
ance was estimated using the mean spike rate at each bin. 

To isolate prospective and retrospective components in the alternation task, we 
analysed a subset of trials with an equal number of correct and error trials, so that 
we could focus on only one of the components, either prospective or retrospect- 
ive, while the influence from the other component was cancelled out due to an 
equal number of left- and right-directed trajectories. The details of this procedure 
are as follows. 

Suppose that we want to estimate the probability that the animal took a left 
trajectory on the previous trial using the spike rates on the stem. The probability 
for correct trials can then be expressed as: 


P(last L|rate, correct) = P(LR|rate, correct) = 1—P(RL|rate, correct) 


where ‘last L’ indicates a left trajectory choice on the previous trial, ‘LR’ indicates 
the trajectory from the previous left arm to the next right arm, and ‘RL’ indicates 
the trajectory from the previous right arm to the next left arm. The first term 
indicates the probability that the animal tooka left trajectory on the previous trial, 
given the spike rates on the stem on correct trials. Note that, if the direction of the 
last trajectory is given, the next trajectory choice is automatically determined, 
depending on whether the trial is correct or incorrect. Similarly, the probability 
for error trials is expressed as: 


P(last Llrate, error) = P(LL|rate, error) = 1— P(RR|rate, error) 


Now the retrospective component of the trajectory representation can be esti- 
mated using the following equation: 


P(last L|rate) = P(correct) x P(LR|rate, correct) + P(error) x P(LL|rate, error) 


P(correct) and P(error) are the probabilities of correct trials and error trials, 
respectively. Evaluating this equation with all data sets, however, gives a bias 


towards the next right turn, because P(correct) >> P(error). In other words, 
this equation gives P(last L, next biased to R | rate) when all trials are considered. 
To cancel out the influence of the next trajectory on the decoding performance, 
we used a subset of trials with an equal number of correct and error trials. 
This gives: 


P(last L, next unbiased|rate) 


= P(correct) x P(LR|rate, correct) + P(error) x P(LL|rate, error) 


i 1 
=5% P(LRIrate, correct) + 3% P(LL|rate, error) 


1 
*. P(correct) = P(error) = 5 


Here the probabilities P(LR | rate, correct) and P(LL | rate, error) can be 
obtained using the same decoding procedure as in Fig. 5a, b. A statistical distri- 
bution for decoding performance was estimated from 1,000 randomly sampled 
subsets with an equal number of correct and error trials (a bootstrap resampling 
method). Similarly, the prospective component of the trajectory representation 
was estimated as: 


P(last unbiased, next R|rate) 


= P(correct) x P(LR|rate, correct) + P(error) x P(RR|rate, error) 


1 1 
=e P(LRIrate, correct) + a% [1 — P(LLrate, error)| 


1 
*, P(correct) = P(error) = 3 


To compare overall firing rates on correct trials and error trials we normalized 
firing rate on the stem on error trials to firing rates on the stem on correct trials. 
Normalized firing rates ranged from 0.90 for CA1 to 1.04 for mPFC. Only the 
CA1 group was significantly different from 1. To test the influence of the lower 
rate on error trials, the overall firing rates on error trials in the CA1 group were 
multiplicatively increased by the factor of 1/0.9 in order to match the rates on 
correct trials. This adjustment did not change the magnitude of the prospective 
and retrospective components of the firing. 

The number of trials used for decoding analysis on continuous trials (Fig. 5a) 

was, for mPFC, 1,035 correct trials and 27 error trials; for NR, 1,199 correct trials 
and 34 error trials; and for CA1, 1,145 correct trials and 25 error trials. The 
number of cells per session was, for mPFC, 14.9 + 0.9; for NR, 4.5 + 0.2; 
and for CA1, 6.3 + 0.6. The number of trials for trials with a delay (Fig. 5b) 
was, for mPFC, 319 correct trials and 29 error trials; for NR, 637 correct trials 
and 83 error trials; and for CA1, 439 correct trials and 54 error trials. The 
number of cells per session was, for mPFC, 8.8 + 0.8; for NR, 5.2 + 0.3; and 
for CA1, 7.0 + 0.6. 
Open-field tests. Animals were tested also in a square box with individually 
exchangeable walls (black on one side, white on the other side; 100 cm X 
100 cm; 50 cm high). Distal background cues were masked by black curtains 
encircling the recording box (180 cm diameter). A pedestal, where the rat slept 
and rested, was placed between the test box and the experimenter outside 
the curtains. 

Rate remapping was induced in the hippocampus by changing the colour 
configuration of the recording box while the box was kept at a constant location. 
The rat was first placed into the black/white box for 10 min, then into the box with 
opposite colour for two consecutive 10-min sessions, and then back into the 
original black/white box for a final 10-min session. The rats were allowed to rest 
for 5 min on the pedestal between the sessions. While the animal was resting, the 
four walls of the box were flipped and the floor was washed with water. Animals 
with tetrodes in NR and CA] were tested in a black-white-white-black sequence 
of four sessions, using the same box. Animals with tetrodes in mPFC were 
recorded in a black-white-black sequence of three sessions. 

For all recordings in the open field, spatial rate distributions for each well- 
isolated unit were constructed by summing the total number of spikes that 
occurred in a given location bin (5 cm X 5 cm) and dividing by the amount of 
time spent in that bin. An adaptive smoothing method was applied for colour- 
coded rate plots and for the calculation of spatial correlation and peak firing rate 
to optimize the trade-off between blurring error and sampling error®’. The firing 
rate at each bin in the environment was estimated by expanding a circle around 
the point until 

TS a 


ny/s 


where r is the radius of the circle in bins, n is the number of occupancy samples 
within the circle, s is the total number of spikes in those occupancy samples and 
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the constant ~ is set to 10,000. With a position sampling rate of 50 Hz, the 
firing rate at that point was then set to 50-4. The maximum value in the smoothed 
rate map was taken as the peak firing rate of the cell. Spatial information 
for individual cells was calculated from spike rate maps, using the following 
equation: 


So pi atlog 4 
A 7 2h 


where A; is the mean firing rate in the i-th bin, / is the overall mean firing rate and 
pi is the probability of the animal being in the i-th bin (occupancy in the i-th bin/ 
total recording time). 

Firing patterns were compared across trials with a spatial correlation proced- 
ure. Pearson correlation was measured between the firing rates in common pixels 
of the two maps for each cell. For comparison of peak rate, the difference of peak 
rates between boxes divided by the peak rate across the sessions was calculated for 
each cell. The average value obtained from all possible combinations of sessions 
with either the same or different colour boxes was considered as representative for 
each cell. For example, for the sessions with a black 1, white 2, white 3, black 4 
sequence, the average value of correlation or rate change between similar session 
pairs (black 1/black 4 and white 2/white 3) was used as a representative value for 
the same colour comparison, while the average among different session pairs 
(black 1/white 2, black 1/white 3, white 2/black 4, white 3/black 4) was used for 
the different colour comparison. 

Histological procedures and electrode positions. The rats received an overdose 
of pentobarbital and were perfused intracardially with saline followed by either 
4% formaldehyde (vol/vol). The brains were extracted and stored in formalde- 
hyde, and frozen coronal sections (30 fim or 40 fm) were cut and stained with 
cresyl violet. Each section through the relevant brain region was collected for 
analysis. All tetrode and optic fibre traces were identified and the tip of each 
electrode was found by comparison across adjacent sections. The position and 
extent of the neurotoxic lesions in NR were outlined in Nissl-stained sections 
throughout the anteroposterior extent of NR. Lesioned tissue was defined by 
either absence of tissue or stained neurons, and included areas showing 
picnotic neurons. 

Statistical tests. All statistical tests were two-sided. Data met assumptions about 
normality when parametric statistics were used. 
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Code availability. Code for decoding of subsequent trajectories can be obtained 
from the authors. 
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Extended Data Figure 1 | Nissl-stained coronal sections showing tetrode positions in each animal with recordings in CA1, CA3 or NR. Positions of tetrode 
tracks are indicated by red circles. Rat number, recording region and type of electrode assembly are indicated above each section. 
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Extended Data Figure 2 | The influence of rate variability on sorting of 
high-rate and low-rate trajectories, trajectory coding in CA1 and CA3, and 
spatial properties of neurons in mPFC, NR and CA1. a, Demonstration of 
the influence of rate variability on sorting of high-rate and low-rate trajectories 
in Figs 1d, f, 2d, 3d and 4d. For each cell, the rate variability (s.d.) within the 
trajectory in which the cell exhibited a higher peak firing rate was estimated. 
Gaussian noise with the estimated s.d. was then added to the original rate data. 
The figure shows two sets of data with the addition of independent Gaussian 
random noise, sorted into high-rate and low-rate trajectories as in Fig. 1d. The 
colour-code difference between high-rate and low-rate trajectories reflects 
the rate variability within the same trajectory. These plots are substantially 
different from the original plots for CA1 (Fig. 1d), NR (Fig. 2d) and mPFC 
(Fig. 3d) but are similar to the plots for CA3 (Fig. 1f) and CA1 with NR lesions 
(Fig. 4d), indicating that colour-code differences on the latter plots can be 
largely accounted by the rate variability within the same trajectory. b, Box 
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plot showing CA1-CA3 difference in change of peak rate (left) but not field 
position (right) in the continuous alternation task. *P < 0.05. c, Distribution 
of spatial information across cells in CA1, NR, and mPFC (frequency 
histogram and box plot; *P < 0.05). Spatial information per spike was 
significantly higher in CAI neurons than in NR or mPFC neurons (CAI, 
1.46 + 0.09; NR, 0.048 + 0.009; mPFC, 0.134 + 0.012 bits per spike 

(mean + s.e.m.); CA1 versus NR, P < 0.001, D = 0.98; CA1 versus mPFC, 
P<0.001, D = 0.93, Kolmogorov-Smirnov test). Spatial information per spike 
was also higher in mPFC than in NR (D = 0.36, P < 0.001, Kolmogorov- 
Smirnov test) but this difference was not significant when measured in 

bits per second (D = 0.07, P = 0.19), indicating that the difference per spike 
is largely due to higher firing rates in NR (NR, 7.83 + 1.27 Hz, mPFC, 

4.86 + 0.39 Hz (mean + s.e.m.)). The total number of neurons analysed was 
71 in CA1, 61 in NR, and 164 in mPFC. 
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Extended Data Figure 3 | Tetrode positions and localization of trajectory- 
dependent cells in mPFC. a, Positions of tetrode tracks are indicated with red 
circles. Rat numbers are indicated. b, Spike waveform widths (peak-to-trough 
time) and mean spike rates on the stem (both trajectories combined) were 
plotted for each cell in mPFC. Trajectory-dependent cells are indicated in red 
and trajectory-independent cells in blue. No significant difference was observed 
in spike widths of the two cell types (D = 0.071, P = 0.802, Kolmogorov- 
Smirnov test) but the mean spike rates of trajectory-dependent cells were 
weakly—but significantly—higher than those of trajectory-independent cells 


@ 


(trajectory-dependent cells, 8.10 Hz; trajectory-independent cells, 5.87 Hz; 

D = 0.172 P = 0.016, Kolmogorov-Smirnov test). c, Percentage of trajectory- 
dependent cells in superficial versus deep layers of mPFC and in prelimbic 
versus dorsal anterior cingulate areas. The percentage of trajectory-dependent 
neurons was slightly larger in the dorsal anterior cingulate area than in the 
prelimbic area (45.3% versus 33.2%, z = 2.26, P = 0.024, binomial test). 
There was no significant difference between superficial and deep layers (32.0% 
versus 41.0%, z = 1.54, P = 0.125). 
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Extended Data Figure 4 | Nissl-stained coronal sections showing tetrode lesions (lesioned NR area/total NR area): #15465, 73%; #16214, 68%; #16249, 
positions in CA1 of animals and the extent of their NR lesions. Positions of 85%; #16337, 70%. Percentage of lesioned areas specific to NR (lesioned 
tetrode tracks in CA] are indicated by red circles. Right sections show NR area/total lesioned area): #15465, 71%; #16214, 80%; #16249, 73%; 
outlines of the lesioned areas (orange) and NR (black dashed line) at different #16337, 80%. 

anterior-posterior levels. Note that all lesions are bilateral. Percentage of NR 
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Extended Data Figure 5 | Effect of removal of NR input on behaviour, spike 
rates and local field potentials in CA1. a, Left, behavioural performance 
(percentage of correct trials) on the continuous alternation task in control 
animals and animals with NR lesions. No significant difference was observed 
(xi = 1.76, P = 0.184; Kruskal-Wallis test). Right, number of days required for 
animals to reach behavioural criterion (correct trials >90%; t14 = 0.236, 

P = 0.817). b, No significant difference was observed between control and NR- 
lesioned animals in mean running speed on the stem (t9 = 0.403, P = 0.692). 
c, The mean spectral power of the local field potentials recorded in the CA1 
pyramidal layer was not different between control and lesioned animals. The 
plots show mean values (solid lines) with 95% confidence intervals (shaded 
areas). d, Left, change in field position across alternating trajectories for CA1 
place cells recorded in control animals and animals with NR lesions. Right, box 
plots showing difference in change of peak rate between CA1 in control animals 
and CA1 in animals with NR lesions (top; D = 0.346 P < 0.001, Kolmogorov- 
Smirnov test). There was no corresponding change in field position (bottom; 
D = 0.083, P = 0.982 Kolmogorov-Smirnov test). *P < 0.05. e, Behavioural 
performance did not change during laser stimulation in NR in eNpHR- 
expressing animals (x3 = 2.77, P = 0.250; Friedman test). f, Running speed 
on the stem did not change significantly during laser application (F,2. = 0.89, 
P = 0.423, repeated-measures ANOVA). g, Mean firing rates of place cells on 
the stem (both trajectories combined) did not change significantly during 
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laser application but were significantly reduced after termination of the 
stimulation (F,23 = 6.52, P = 0.002, repeated-measures ANOVA; 

before versus during: t4g = 1.651, P = 0.105, during versus after: ty, = 2.137, 
P = 0.038, post hoc paired t-test). The lack of a consistent reduction in CA1 
mean firing rate during light application probably reflects the fact that 
excitatory inputs from NR to CA] terminate not only on pyramidal cells but 
also on local inhibitory neurons“. h, Spectral power of local field potentials in 
the CA1 pyramidal layer was not significantly changed by laser application 
in NR (before/during laser application, mean + s.e.m. V7 Hz ‘ina 

decibel scale): delta (1-4 Hz): 37.3 + 0.92/37.6 + 0.99; theta (6-11 Hz): 

41.8 + 1.08/41.4 + 1.18: slow gamma (25-50 Hz) 31.6 + 0.41/31.7 + 0.42; fast 
gamma (60-90 Hz): 26.1 + 0.53/26.2 + 0.53). The plot shows mean values 
(solid lines) and 95% confidence intervals (shaded areas). i, Left, field position 
shift between alternating trajectories (frequency histograms and box plots). 
Right, box plots showing difference in change of peak rate (top; Fro, = 12.02, 
P < 0.001, repeated-measures ANOVA), but not field position (bottom; 

Fy, 9g = 0.02, P = 0.983). Before laser stimulation, the rate change between left 
and right laps among place cells that expressed significant trajectory-dependent 
firing was 52.0 + 5.1% (mean = s.e.m.). During stimulation, the rate change 
dropped to 38.6 + 4.4% (t33 = 4.04 P < 0.001; paired t-test, two-tailed). The 
effect recovered to baseline levels after the laser application was terminated 
(55.0 + 3.9%; t33 = 4.81, P < 0.001; paired t-test). 
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Extended Data Figure 6 | Nissl-stained coronal sections showing positions 
of tetrodes and optic fibres in optogenetic experiments. The figure is 
organized into six blocks, one for each of six animals. Five-digit animal 
numbers are indicated above each block. Top two rows of each block, positions 
of tetrode tracks (red circles) in CA1 of each animal. Bottom left, position of 
optic fibre above NR (red rectangular outline). NR is indicated by a black 
dashed triangle. The tip of the fibre was placed near the midline to silence cells 
at both sides of the midline. The bottom right panels of each block show spike 
rates for two representative units on the tetrodes attached to the optic fibre 
(>~750 um from the tip of the fibre). The two panels show mean values (solid 
line) and 95% confidence intervals (shaded areas) of spike rate, with spike 
rasters at the top. The 532 nm laser was applied for 5 s (0-5 s on the x axis). A 
significant reduction of spike rate was observed during laser application for all 
units in this figure (P < 0.05, t-test). Unlike the ibotenic-acid lesion, which 
destroyed 68-85% of the NR, the laser light probably reached only a small 


portion of the nucleus. The total volume of NR is roughly 2 mm? (1 mm of 
width, 1 mm of height, and 2 mm of length). Supposing that the tip of the optic 
fibre was located 250 im above NR, that the laser light suppressed activity up 
to 1 mm below the fibre tip (optic fibre 0.39 NA, core size @ 400 jim), and 
that all cells within this region were inactivated, the estimated proportion of 
NRaffected by the laser light would be ~36% at the most. The tetrodes attached 
to the optic fibre were positioned approximately 750 ,1m below the fibre tip, 
near the distance limit of laser light for activation of halorhodopsin. At this 
depth, the intensity of the laser was probably not sufficient to activate 
halorhodopsin maximally. The sub-maximal activation probably accounts for 
the relatively slow time course of NR silencing (sometimes > 1 s). Another 
contributing factor may be that thalamic neurons express T-type calcium 
channels, which are de-inactivated by hyperpolarization, making the neurons 
more excitable”. This excitation may retard the suppression of NR activity. 
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Extended Data Figure 7 | Trajectory coding in CA1, CA3 and CA1 of NR- 
lesioned animals, and influence of behavioural variables on the decoding 
analysis. a, Decoding of succeeding trajectory using peak firing rates of CA1 or 
CA3 place cells with firing fields on the stem as inputs to a linear classifier. Only 
trials with correct choices are included. Decoding performance was estimated 
from randomly selected cell groups in the entire data set across animals and 
plotted as a function of number of neurons in the sample. The plot shows mean 
(solid lines) + s.e.m. (shaded areas). Significant differences between CA1 
versus CA3 or CA1 versus CA1 with NR lesions are indicated by dashed lines at 
the top (P < 0.05). To estimate the decoding performance for a specific number 
of cells, the desired number of cells was randomly selected from the entire data 
set across all animals. As the total trial number of runs on left- or right-run 
trajectories was often different across the recording sessions from which the cell 
group was taken, we randomly subsampled the trials to equalize the total 
trial number across sessions. This subsampling procedure was performed ten 
times, decoding performance was acquired for each, and the average was taken 
as an estimate of the decoding performance of the cell group. Then, a 
different cell group with the same number of cells was randomly selected and 
the same procedure was performed. The procedure was repeated 1,000 times to 
acquire a statistical distribution of decoding performance for the given number 
of cells (bootstrap resampling method). P values were estimated from the 
bootstrap distributions. The peak firing rates of approximately 15 CA1 place 
cells on the stem provided sufficient information to indicate a correct 
succeeding trajectory with over 90% accuracy (96.6 + 4.3% with a total of 

30 cells, mean + s.e.m.). The decoding accuracy was significantly lower when 
CA3 cells or CA1 cells from NR-lesioned animals were used as inputs to the 
classifier (decoding performance with 30 cells: CA3, 69.9 + 7.0%; NR-lesioned, 
76.9 + 7.6%). These results suggest that, for correct choices, the subsequent 
trajectory can be read out reliably from the collective firing of place cell 
ensembles in CA1 of animals with intact NR-CA1 connections. b, Left, 
decoding of correct subsequent trajectory using firing rates of CA1 place cells 
from NR-lesioned animals as inputs to the classifier. Trials with correct choices 
and error trials are shown separately (number of trials analysed for animals 
with NR lesions: correct, 467; error, 23; cell number per session, 7.5 + 0.9 
(mean + s.e.m.)). Symbols as in Fig. 5a. We also estimated the decoding 
performance using peak firing rates on the stem (without binning). In the CA1 
of lesioned animals, the decoding performance was not significantly different 


between correct trials and error trials (67.2 + 2.2% versus 69.6 + 9.8%; 
mean + s.e.m.). In CA1 of control animals, performance was 80.5 + 1.2% on 
correct trials and 56.0 + 10.1% on error trials (interaction term in a logistic 
regression analysis with task performance (correct versus incorrect) and 
manipulation (control versus NR lesion) as coefficients, Z = 2.08, P = 0.038; 
post hoc binomial test for correct versus incorrect trials: NR lesion, Z = 0.232, 
P = 0.816; control, Z = 3.03, P = 0.002). Right, retrospective and 
prospective components estimated from spike rates of CA1 cells in animals 
with NR lesions. Symbols as in Fig. 5c. NR lesions specifically disrupted the 
prospective component of the trajectory representation (50.6 + 4.9% 
successful decoding of succeeding path on the last half of the stem, compared to 
59.5 + 4.7% in intact animals; chance level 50%), supporting the idea that 
the mPFC-NR-CAI circuit is necessary for the hippocampus to access to the 
information about intended actions. The retrospective component was still 
decodable (60.1 + 4.9% successful decoding of retrospective paths on the last 
half of the stem, compared to 63.5 + 4.8% in intact animals), suggesting that 
CAI cells with residual trajectory dependence after NR lesions exclusively 
represent the animal’s trajectory on the preceding trial. c, d, Differences in 
decoding performance are not caused by differences in running speed, head 
direction or lateral position. c, Differences in running speed, head direction, 
and lateral position between left- and right-turn trajectories were assessed 
across all recording sessions used for the decoding analysis. No significant 
differences in any of these behavioural variables were observed (P > 0.05, t-test 
with Bonferroni correction for multiple comparisons of six bins). The plot 
shows mean ~ s.e.m. Standard errors were estimated for each recording 
session. d, While we did not observe any significant difference in any of the 
above behavioural variables, we still observed small systematic trends, as shown 
in c. To exclude the possibility that these small trends have an influence on 
the decoding performance* we generated, for each variable, a subsampled data 
set by excluding iteratively the trial with the largest deviation until we 
obtained nearly the same mean value for the respective behavioural variable on 
left- and right-turn trajectories. The decoding performance was calculated from 
this subsampled data set and the result was compared to the one from the 
original data set. No significant difference in decoding performance was 
observed between the groups, suggesting that the small and non-significant 
differences of behavioural variables do not account for the differences in firing 
on left and right trials and the decoding that results from these differences. 
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Extended Data Figure 8 | Firing rates in NR and mPFC fail to distinguish 
between discrete environments. We have shown that CA1 cells encode 
intended trajectories by differences in firing rate rather than firing position. A 
similar rate-based coding scheme has been observed in place cells of freely 
foraging animals trained to distinguish between open-field environments 
differing in colour or shape but not location (with similar rate differences 
following changes in colour or shape)’. Based on this similarity, we asked 
whether activity in mPFC and NR accounts for rate differences also between 
discontinuous environments. We recorded simultaneously 51 place cells from 
CA1 and 49 cells from NR during free running in a pair of differently coloured 
square boxes located at the same place in the room (three rats). A total of 176 
cells were recorded from mPFC in a different set of animals (two rats). 

a, Colour-coded rate maps for a representative sample of simultaneously 
recorded cells in CA1 and NR on consecutive trials of free foraging in the square 
enclosure. Cartoons on top indicate the sequence of trials with different box 
colours (black-white-white-black). Boxes were always in the same location. 
Note strong rate remapping (change in firing rate but not firing location) in 
CAI but no rate code in NR. b, Colour-coded rate maps for cells in mPFC. 


Symbols as in a. Note lack of change in firing rate. c, Top, change in peak firing 
rate between trials with similar (blue) or different (red) colour configuration. 
Bottom, spatial correlation between trials. The rate change between the two 
environments was significant in CA1 (same versus different colour, peak 

rate change fjo9 = 6.40, P < 0.001) but not in NR (tos = 0.875, P = 0.38) or 
mPFC (t359 = 0.924, P = 0.36). Taken together, these results suggest that 
changes in the distribution of firing rate in CA1 can have multiple sources. 
While mPFC-NR inputs may be necessary for trajectory selection, the change 
in rate distribution between discrete environments may depend on other 
hippocampal inputs, such as those from the lateral entorhinal cortex**. In the 
foraging task, the firing rates of the mPFC and NR neurons are modulated by 
subsequent direction of movement, mirroring their trajectory-dependent 
firing in the alternation task (Extended Data Fig. 9), but because trajectory 
directions are variable in this task, trajectory-dependent activity is likely to be 
cancelled out in time-averaged rate maps. The colour-reversal task should 

be sufficiently sensitive to detect influences of discrete stimuli, considering that 
mPFC cells do respond to such changes under other conditions”. 
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Extended Data Figure 9 | Activity of neurons in mPFC and NR correlates 
significantly with movement direction in the continuous alternation task 
and the open field environment. a, Spike-rate maps based on the animal’s self- 
movement were generated by previously published procedures™. In brief, position 
and head direction data were smoothed with a 25-sample quadratic local 
regression (loess) fit. Changes in the animal’s position and heading were 
calculated between the start and end ofa sliding 100 ms time window to generate 
movement vectors. Movement vectors in each map were binned at 4 cms! X 
4cms ’. The self-motion rate map was generated by dividing the sum of spike- 
triggered movement vectors by the total number of movement vectors at each bin, 
which was smoothed by a 2D Gaussian filter with a bandwidth at 1.5 bin. To 
understand the temporal relationship between spike timing and the animal’s 
prospective or retrospective motion, spike-triggered movement vectors were 
generated from movement vectors that were systematically time-shifted relative 
to spike time, from one second before to one second after the spike event. Top, 
colour-coded rate maps for a representative mPFC cell which was tuned to left 
forward movement in the open field environment. Bottom, colour-coded rate 
maps for a representative NR cell that was tuned to right forward movement. b, 
Self-movement information in spikes was estimated using the following equation: 


a P7182 7 


where /; is the mean firing rate in the i-th bin, 2 is the overall mean firing rate 
and p; is the probability of the animal being in the i-th bin (occupancy in the 
i-th bin/total recording time). Shaded areas indicate the range of the values 
(mean + s.e.m.) obtained from a shuffled data set generated by shifting spike 
timings either +2 s or —2 s across the session, which will disrupt spike- 
triggered movement information but maintain spike number and spike 
patterns. The time periods when spikes provide significant information about 
self-centred movement direction compared to the results from the shuffled 
data set are indicated by dashed lines at the top (P < 0.05, t-test). ¢, Self- 
movement rate-map stability within a recording session. The behavioural 
session (10 min) was divided into first and last halves and Spearman’s 
correlation between self-movement maps generated from each half was 
calculated. Significant map stability was observed around the spike time both 
in mPFC and NR (compared to the shuffled data set, P < 0.05, t-test), 
indicating that spikes provide reliable information about self-movement. d, 
Two representative examples of mPFC cells recorded both in the alternation 
task and in the open field. While these cells expressed a similar trend of 
preferred self-movement direction across the tasks, they exhibited stronger 
self-movement tuning in the alternation task than in the open field 
exploration. 
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Extended Data Figure 10 | Conjunctive coding of position and trajectory 
increases representational dimensionality in CA1. Trajectory-dependent 
coding is not necessary for performance in the continuous version of the 
alternation task because animals with complete hippocampal lesions are 
unimpaired in this task”', as were the animals with lesions of the NR input to the 
hippocampus in the present study. The continuous alternation task may thus be 
too simple for decision behaviour to be affected by lesions of the NR-CA1 
system. To examine how trajectory-dependent firing might contribute to 
navigation behaviour, we estimated the representational advantage of encoding 
space with trajectory-dependent place cells instead of separate cell populations 
for trajectory and location. It has been suggested that nonlinear integration of 
multimodal information in individual neurons enhances the capacity of 
downstream neurons to classify combinations of features of high-dimensional 
information”. Similarly, we hypothesized here that a key advantage of 
trajectory-dependent place cells in CA1 is the enhancement of the classification 
capacity for position—trajectory combinations in efferent neurons, an 
advantage that may not be evident in an alternation task with only a single 
choice point. a, Example of a task that requires discrimination of multiple 
position-trajectory combinations. For successful performance, animals choose 
a right-turn path at the first choice point in A and then a left-turn path at the 
next choice point in B. b, To perform the task in a, the brain might use cells that 
represent correct combinations of movement direction and position on the 
trajectory. An example cell might be active when the animal plans a right path at 
position A as well as a left path at position B, but not otherwise. c, Suppose that 
cells with activity on the stem can be categorized into three classes: trajectory- 
dependent non-place cells, trajectory-independent place cells, and trajectory- 
dependent place cells. The neural activity in b cannot be generated from any 
linear combination of the two former classes (trajectory-dependent non-place 
cells and trajectory-independent place cells), as shown in the following 
argument. Suppose that the activity patterns of trajectory-dependent non-place 
cells, either right turn or left turn, can be expressed by the following activity 
matrices, with each row representing future trajectory, right or left, and each 
column showing position, that is, A or B: 


RCo oC a) 


Similarly, activity of trajectory-independent place cells, with firing fields on 
either position A or B, can be expressed as follows: 


(7 oP(c 1) 


The activity matrix of a downstream neuron driven by a linear combination of 
the above four types of neurons can be expressed as follows: 


WatTWe Wb We 
Wa XR+wy xX L+w, x A+wyg XB 
WatTWd Wh Wd 
For the downstream neuron to express the desired activity in b, the following 


conditions are required: 
active 
inactive 


inactive 
active 


Watwe>0 (1) 
Wa twa <0 (2) 
wy twe<0 (3) 
wp twa > (4) 


where 0 is the threshold of activity. However, summation of (1) and (4) gives 
Wa +wy+we + we > 20, whereas summation of (2) and (3) gives 
Wa+wy+we+wa < 26, resulting in a contradiction. Thus, neurons with 
pure selectivity alone cannot generate the desired activity. To achieve the 
activity in b, neurons with nonlinear mixed selectivity, namely trajectory- 
dependent place cells, are required (also see ref. 35). d, To estimate the number 
of implementable patterns in the recorded CA1 neurons, firing rates of 
neurons at each of 12 behavioural states (six stem positions with two future 
trajectory directions) were analysed. In addition to the recorded activity, we 
extended the data using a resampling procedure*’. Resampling was performed 
by cyclic permutation of firing rates across stem positions. Supposing that 
the original activity of the recorded neurons is represented by sequential 
numbers of six stem positions as (1 2 3 4 5 6), five sets of new activity were 
generated by exchanging activity across stem positions, resulting in (23 45 6 1), 
(345612), (456123), (561234) and (612345). Resampling not 
only increased the number of neurons for analysis, but also minimized 
spatial bias of ensemble representations across stem positions. Following 
resampling, decoding performance was calculated for all binary combinations 
of 12 states (2)? = 4,096 patterns), using a linear classifier with firing rates 

in each behavioural state as inputs, as in Fig. 5. e, The number of implementable 
patterns was determined for neurons in CA1, CA3, and CA1 from animals 
with NR lesions, and from CA1 of NR-lesioned animals combined with NR 
cells from intact animals. For the latter group, the total number of cells was 
doubled after combining the same number of cells from two populations, as 
indicated on the x axis with a different colour. A binary pattern was considered 
as implementable if the decoding performance was better than 99%. For 

each sample size, cells were randomly selected five times to estimate the 
standard deviation of the decoding performance. Plots indicate mean + s.d. 
Regardless of the size of the cell sample, the analysis showed a significantly 
larger number of implementable patterns for CA1 than for the other groups, 
including the combination of trajectory-dependent cells in NR and non- 
trajectory-dependent place cells in CA1, suggesting that integration of NR 
inputs in CA1 place cells is a key step to achieve high-dimensional 
representations. The results point to trajectory-dependent place cells and the 
mPFC-NR-CAI circuit as possible elements of the neural circuit for 
discrimination of complex position-trajectory combinations, such as the one 
illustrated in a. Combinatorial coding provides a computational basis for 
efferent neurons to perform addition or subtraction among vectors in different 
coordinate systems*”~*’, such as the allocentric reference frame imposed by 
spatial cells in the entorhinal cortex” and the egocentric trajectory frame 
dependent on projections from mPFC through NR. Such vector operations 
may be essential for the network to estimate a future allocentric position, which 
is one of the key steps of route planning during goal-directed navigation. 
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Lymphatic vessels arise from specialized 
angioblasts within a venous niche 


J. Nicenboim", G. Malkinson!, T. Lupol, L. Asaf!, Y. Sela!, O. Mayseless', L. Gibbs-Bar', N. Senderovich?, T. Hashimshony’, 
M. Shin?, A. Jerafi-Vider!, I. Avraham-Davidi', V. Krupalnik*, R. Hofi!, G. Almog!, J. W. Astin®, O. Golani®, S. Ben-Dor*, 
P. S. Crosier®, W. Herzog’’®, N. D. Lawson?, J. H. Hanna‘, I. Yanai? & K. Yaniv! 


How cells acquire their fate is a fundamental question in developmental and regenerative biology. Multipotent 
progenitors undergo cell-fate restriction in response to cues from the microenvironment, the nature of which is 
poorly understood. In the case of the lymphatic system, venous cells from the cardinal vein are thought to generate 
lymphatic vessels through trans-differentiation. Here we show that in zebrafish, lymphatic progenitors arise from a 
previously uncharacterized niche of specialized angioblasts within the cardinal vein, which also generates arterial and 
venous fates. We further identify Wnt5b as a novel lymphatic inductive signal and show that it also promotes the 
‘angioblast-to-lymphatic’ transition in human embryonic stem cells, suggesting that this process is evolutionarily 
conserved. Our results uncover a novel mechanism of lymphatic specification, and provide the first characterization 
of the lymphatic inductive niche. More broadly, our findings highlight the cardinal vein as a heterogeneous structure, 


analogous to the haematopoietic niche in the aortic floor. 


The lymphatic system plays a crucial role in normal and pathological 
conditions. It is essential for maintaining fluid homeostasis, for 
immune responses and for dietary lipid absorption, and is exploited 
by tumours to metastasize’. Close to a century ago, two models 
describing the origins of the lymphatic system were proposed. 
While Sabin’ suggested a venous origin for the lymphatic endothe- 
lium, the second model, put forward by Huntington & McClure’, 
postulated that lymphatic vessels form by concrescence of discontinu- 
ous and independent lymph vesicles, and that mesenchymal-derived 
cells constitute the walls of the lymphatic vessels. Studies performed 
during the last decade, involving in vivo imaging in zebrafish* and 
lineage tracing in mice’, have extensively confirmed Sabin’s hypo- 
thesis. Nevertheless, the presence of mesenchymal lymphangioblast- 
derived lymphatic vessels has been described in Xenopus tadpole and 
chick’ embryos. At present, the embryonic origins of the lymphatic 
endothelium still remain controversial. 

During the past years, specific markers of the lymphatic endothe- 
lium have been identified, which provided new insights into the 
mechanisms controlling lymphatic specification and growth’. 
Assembly of the lymphatic vascular network is considered a stepwise 
process, which begins approximately at embryonic day 9.5 (E9.5) 
when the expression of Prox1, a master regulator of lymphatic differ- 
entiation and maintenance’ is first detected in a subpopulation of 
endothelial cells within the cardinal vein (CV). Two additional tran- 
scription factors—Sox18 (ref. 10) and Nr2f2 (also known as 
COUPTFIL, ref. 11)—were shown to be required for induction of 
Proxl expression. The newly specified lymphatic progenitors then 
bud from the CV in response to VEGFC signalling’ and form prim- 
itive lymph sacs, which eventually give rise to the entire lymphatic 
vasculature. Most recently, an important role for BMP2 (ref. 13) and 
the RAF1/MEK/ERK signalling cascade’ in the specification of 
lymphatic fate has also been established. Nevertheless, as none of 


these factors seems to be asymmetrically expressed, the question of 
how only a subset of cells within the CV is initially specified towards a 
lymphatic fate, as opposed to cells that will maintain a venous identity, 
remains unanswered. 

The zebrafish was recently shown to possess a lymphatic system 
that shares many similarities with lymphatic vessels found in other 
vertebrates**. In vivo imaging of 2-4 days post-fertilization (dpf) 
zebrafish embryos demonstrated that the parachordal cells (PACs), 
which form at ~2 dpf along the embryo’s midline and serve as build- 
ing-blocks for the lymphatic system later on, are derived from the 
posterior cardinal vein (PCV). Starting at ~2.5 dpf PACs migrate 
ventrally to generate the main lymphatic vessel, the thoracic duct’. 


Lymphatic progenitors originate in the PCV floor 


To characterize the initial events controlling lymphatic specification, 
we fate-mapped the origins of lymphatic endothelial cells (LECs) 
within the PCV of zebrafish embryos. We imaged Te(flil:EGFP)” 
(Fig. 1a, Supplementary Video 1) and Te(flil:nEGFP)” (ref. 16; 
Extended Data Fig. 1a) embryos starting at 22-24 h post-fertilization 
(hpf) and until 60 hpf, when PACs are fully discernible*. Tracking of 
PAC-LECs back in time and space demonstrated that 81% of these 
cells originated in the ventral side of the PCV (vPCV) compared to 
19% that originated in the dorsal PCV (dPCV). To corroborate these 
results, we took advantage of Te(flil:gal4“”?;uasKaede™*’) embryos, 
expressing the photoconvertible protein Kaede in endothelial cells 
(Kaede is a green fluorescent protein that irreversibly converts to 
red fluorescence under UV light)'’. Pan-Kaede photoconversion of 
vPCV cells at 24 hpf rendered ~90% PACs red, indicating that they 
originated in the floor of the PCV. In contrast, less than 10% red PACs 
were observed following dPCV photoconversion (Fig. 1b-d and 
Extended Data Fig. 1b). vPCV cells generated PACs also in plcg1 
mutants (Extended Data Fig. 1d and Supplementary Video 2), which 
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Figure 1 | Lymphatic progenitors originate in the vPCV. a, Snapshots froma 
time-lapse sequence of a Tg(flil:EGFP) embryo, showing the origin of a PAC 
cell (green) in the VPCV (npacs = 16; Nimaged embryos = 13).b, ¢, Photoswitching 
of ventral (b) and dorsal (c) PCV in Tg(flil:gal4;uasKaede) embryos at 24 hpf 
(light-blue arrowheads). d, Percentage of red PACs (white arrows) at 48 hpf 


a —9.*Dp— 10: 
(Mypcy photoswitched embryos 10, NaPCV photoswitched embryos — 8; *P = 2.66 X 10 ). 
Scale bars, 30 tm. Error bars, mean + s.e.m. 


lack arterial intersegmental vessels as well as blood flow, but develop 
venous sprouts and PACs'*, suggesting that the specification of 
lymphatic progenitors is not affected by nearby arteries or by blood 
circulation. 


The ventral PCV harbours a niche of specialized 
angioblasts 


Previous reports’’ indicated that the budding of LEC progenitors 
from the PCV persists for approximately 24 h. We reasoned that a 
continuous exit of cells would eventually result in disruption of 
the PCV wall, unless LECs arise from a population of specialized 
progenitors that repeatedly divide. Time-lapse sequences of 
Te(flil:gal4"”?;uasKaede"™*) embryos revealed that vPCV cells 
undergo asymmetric division (Fig. 2a) and generate progeny that 
contribute to the nascent PACs (Supplementary Video 3). To further 
confirm these results we scored symmetric vs asymmetric division 
events on each half of the PCV. Cell division was defined as asym- 
metric if (1) it generated a cell of different fate, and (2) the plane of 
division was perpendicular to the PCV main axis. We found a signifi- 
cantly higher number of asymmetric divisions in the vPCV at 24-34 
hpf (initial stages of LEC specification), with no changes in symmetric 
division events (Extended Data Fig. le). In addition, no differences in 
global cell proliferation were detected in the dPCV, vPCV and dorsal 
aorta (Extended Data Fig. 2a), suggesting that the specific arising of 
LECs from the floor of the PCV is not a result of this being a more 
proliferative area. 

Unexpectedly, during the course of tracing photoconverted vPCV 
cells we noticed that, in addition to generating PACs, these cells also 
migrated ventrally to incorporate into the supraintestinal artery (SIA) 
and the subintestinal vein (SIV) (Fig. 2b and Extended Data Fig. 3a). 
Single-cell Kaede photoconversion revealed the dynamics of spe- 
cification of the vPCV progenitors (Fig. 2c). Whereas at 23 hpf most 
of these cells give rise to either PACs or venous intersegmental vessels, 
at 27 hpf there is a shift towards population of the subintestinal vein 
and supraintestinal artery. In contrast, dPCV cells generated mostly 
venous intersegmental vessels throughout all analysed developmental 
stages. Altogether, these results unveil the presence of specialized cells 
within the floor of the PCV, which divide asymmetrically, and gen- 
erate arterial, venous and lymphatic fates. 
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Figure 2 | vPCV cells are specialized angioblasts. a, Snapshots from a time- 
lapse movie of a Tg(flil:gal4;uasKaede) embryo showing photoswitched 
vPCV cell (light-blue arrowhead), which generates PACs (white arrowhead) 
through asymmetric division. b, Single photoswitched vPCV cell in 
Tg(flil:gal4;uasKaede) embryo at 24 hpf (light-blue arrowhead), whose progeny 
populates the supraintestinal artery (SIA, white arrows) and subintestinal vein 
(SIV, yellow arrows) at 56 hpf. c, Location of vVPCV and dPCV progeny at 56 
hpf, following photoswitching at different stages (Mphotoswitched vPCV cells = 73, 
Nphotoswitched dPCV cells = 45). d, Photoswitching of medial and early-lateral 
angioblasts at 17 and 20 hpf (light-blue arrowheads), respectively, 

in Tg(kdrl:Kaede)"”’ embryos. e, Percentage of red PACs at 48 hpf 


(Mmedial angioblasts = 16, Nearly lateral angioblasts — = 16; *P=2.1X 10° ). 
f, g, Tg(flt1_9a_cFos:GFP,flil:dsRed) embryos show flt1_9a:GFP* endothelial 


cells in the SIA, arterial intersegmental vessels (ISA), dorsal aorta (DA) (f), and 
vPCV (g, green, orange arrowheads). h, Selected genes enriched in vPCV cells. 
Scale bars, 30 tum. Error bars, mean + s.e.m. 


We then asked whether these cells represent in fact angioblasts that 
originate directly in the lateral plate mesoderm and migrate medially 
to colonize the floor of the PCV”. Alternatively, these cells could be of 
arterial origin, and sprout ventrally from the dorsal aorta to reach the 
ventral PCV’'. To answer this question, we performed pan-Kaede 
photoconversion of a population of lateral plate mesoderm medial 
angioblasts that colonize the dorsal aorta by 17 hpf*®, or of a popu- 
lation of ventral cells (lateral plate mesoderm early-lateral angio- 
blasts)*°, detected in the trunk by ~19 hpf (Fig. 2d and Extended 
Data Fig. 1c). Fate analysis of the photoconverted cells at 48 hpf 
demonstrated that the vast majority of vVPCV progenitors giving rise 
to PACs in the trunk did not originate in the dorsal aorta, but 
migrated directly from the lateral plate mesoderm to reach their final 
position in the vPCV (Fig. 2d, e). These results indicate that the vPCV 
cells are specialized angioblasts”, which originate directly in the lat- 
eral plate mesoderm and retain their multipotency throughout later 
stages of development. 

To gain insight into the molecular identity of the newly identified 
vPCV angioblasts we initially analysed Tg g(flil: :dsRed)"”"” zebrafish 
(ref. 23) crossed to Tg(flt1_9a_cFos:GEP)"™, a vegfr1 (flt1) enhancer, 
which specifically labels arterial endothelial cells (Extended Data 
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Fig. 3b). We detected green fluorescence in well-established ‘arterial’ 
structures including the dorsal aorta, arterial intersegmental vessels 
and supraintestinal artery (Fig. 2f). Surprisingly however, we also 
detected a few GFP” cells within the PCV (Fig. 2g). To understand 
whether the fitl1_9a:GFP’-vPCV cells represent the population 
of specialized angioblasts that give rise to LECs we imaged 
Tg(flt1_9a_cFos:GFP; lyvel:dsRed2)"”'’ double transgenic embryos 
(ref. 24), in which arterial endothelial cells are GFP", while venous 
and lymphatic endothelial cells display red fluorescence (Extended 
Data Fig. 3b, c). Time-lapse sequences revealed that 100% of PACs 
traced (n = 9) originated from fitl_9a:GFP* cells (Supplementary 
Video 4), through a process of asymmetric cell division. Interestingly, 
we found that the vast majority of these progenitors were located in 
the vPCV (n = 7). Nonetheless, the small number of dPCV cells that 
generate PACs (Fig. 1d) was also labelled by flt1_9a:GFP (n = 2), 
highlighting this angioblast population as the sole origin of LECs in 
the zebrafish trunk. Similar asymmetric division events were detected 
during subintestinal vessel formation (Extended Data Fig. 3d). In this 
case, fitl1_9a:GFP’-vPCV cells generated progeny that populated 
the subintestinal vein and the supraintestinal artery. Altogether 
these results highlight the PCV as a highly heterogeneous tissue, 
containing ‘non-venous’ cells competent to give rise to multiple fates, 
including LECs. 

The fact that lymphatic vessels originate from a novel population of 
PCV angioblasts and not from fully differentiated venous endothelial 
cells, as previously postulated, prompted us to enquire into the 
molecular signature of these cells. Global expression profiling via 
RNA sequencing (RNA-Seq)”’ (Extended Data Fig. 4a, b) revealed 
significant enrichment of well-established angioblast, lymphatic and 
arterial markers in the vVPCV vs dPCV cells (Fig. 2h and Extended 
Data Fig. 4c, d). We then asked when do these progenitors acquire a 
lymphatic fate. In mammals, the expression of the transcription factor 
Prox in certain cells of the CV marks the onset of lymphatic spe- 
cification’. To investigate whether this is the case in zebrafish as well, 
we imaged TgBAC(proxla:KalT4-UAS:uncTagRFP)"”” embryos”. 
We found that the first cells expressing proxla are already visible 
at 22-24 hpf in the vPCV (Fig. 3a). Later on these cells divide, 
translocate to the dorsal PCV, and bud from the PCV to generate 
PACs (Fig. 3a and Supplementary Video 5). Approximately 80% 
of Tg(flil:EGFP;proxla:KalT4-UAS:uncTagRFP) embryos displayed 
1-2 proxla* cells in the vPCV at 22-24 hpf, in contrast to ~20% 
embryos displaying proxla* cells in the dPCV (Fig. 3b). At later 
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Figure 3 | LEC specification is induced in the vVPCV angioblasts. a, A 
(proxla:TagRFP;flil:EGFP) * cell in the vPCV (yellow) generates a daughter 
cell that translocates dorsally, buds from the PCV and forms PACs (white 
arrowheads). b, Quantification of T9(flil:EGFP;prox1a:KalT4-UAS:uncTagRFP) 
embryos with 1 to 2 or more than 2 (proxla:TagRFP; flil:EGFP)* cells in vPCV 
and dPCV at 22-24 and 26-30 hpf (1122-24 npp = 115 26-28 np = 24). © Prox] 
immunostaining at 24 hpf shows expression in vPCV cell (light-blue 
arrowheads). d, Quantification of (flt1_9a:GFP;prox1 a:TagRFP)* cells in vVPCV 
vs dPCV at 22-24 and 26-28 hpf (22-24 npr = 18 N26-28 npr = 13; *P = 0.01). 
Scale bars, a, 30 um; ¢, 60 tm. Error bars, mean + s.e.m. NS, not significant. 


58 | NATURE | VOL 522 | 4 JUNE 2015 


stages (26-30 hpf), an increased number of proxla* cells was detected 
in the dPCV, reflecting the proliferation and dorsal translocation of 
the newly specified LECs. Similar results were obtained when we 
analysed the distribution of the Prox] protein (Fig. 3c). We further 
found that most of the cells that expressed proxla at 22-24 hpf 
were fitl_9a:GFP* vPCV-angioblasts (Fig. 3d). Taken together, our 
results analysing global gene expression and lymphatic-specific mar- 
kers demonstrate that lymphatic specification is induced in a 
restricted population of angioblasts in the vPCV. Furthermore, they 
confirm that LECs acquire a lymphatic fate before their budding 
from the PCV. 


Wnt5b induces LEC specification 

Having identified the floor of the PCV as the origin of lymphatic 
progenitors, we analysed the surrounding tissues in search for a 
source of spatially-restricted inductive signals. Histological sections 
of 22 and 24 hpf T¢(fli:EGFP) embryos showed that vPCV cells develop 
close to the endoderm (Fig. 4a). In addition, analysis of cas“°° (sox32) 
mutants, which lack endoderm-derived tissues, revealed that PACs do 
not develop in these mutants (Fig. 4b and Extended Data Fig. 5a), sug- 
gesting that the signal(s) necessary for LEC specification comes from the 
endoderm. 

Recently, the Wnt-f-catenin-TCF/LEF signalling pathway has 
been shown to directly activate Nr2f2 and Proxl—members of the 
LEC specification cascade—in the context of adipogenesis and neu- 
rogenesis~”’**. We thus wondered whether endoderm-secreted Wnt(s) 
could serve as inducer(s) of lymphatic specification in the vPCV cells. 
In situ hybridization revealed clear expression of wnt5b messenger 
RNA in the endoderm of 18-20 hpf embryos (Fig. 4c and Extended 
Data Fig. 5b). Analysis of wnt5b morphants and ppt'”° (wnt5b) 
mutants indicated a significant reduction in the percentage of 
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Figure 4 | Wnt5b is necessary and sufficient for LEC specification. 

a, Histological sections at 22 and 24 hpf, depicting the position of the vVPCV 
angioblasts (light-blue arrows) and the endoderm (e). b, Number of PAC- 
containing segments in wild-type (WT) and cas mutants (n,.4, = 22; nwr = 15; 
*P =3 X10 |). c, In situ hybridization at 18 hpf shows expression of wnt5b 
mRNA in the endoderm. d, Number of PAC-containing segments (arrows) 
in wnt5b morphants (ny; = 63, Nwntspmo = 57; *P = 1.83 X 10 7°). UL, 
uninjected. e, Wnt5b overexpression in Tg(hsp70l:wnt5b-GFP;lyvel:dsRed2) 
embryos induces ectopic lymphangiogenesis (nwr = 19, Mysp7o:wntsb-GEP = 225 
*P = 9.47 X 10°”). hs, heat shock at 37°C. f, Number of PAC-containing 
segments in sox32-MO injected Tg(hsp70l:wnt5b-GFP;lyvel:dsRed2) embryos 
(Ns0x32Mo0 = 18, Msox32MO3hsp70:wntsb = 18; *P = 0.0005). Scale bars, a, 20 jum; 
e, f, 60 tum. Error bars, mean + s.e.m. 
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PAC- and thoracic-duct-containing segments, with no changes in the 
initial number of fit1_9a:GFP*-vPCV angioblasts (Fig. 4d and 
Extended Data Fig. 5c-f). In contrast, overexpression of Wnt5b in 
Tg(hsp70l:wnt5b-GFP"**lyvel:dsRed2) double transgenic embryos 
at 23-24 hpf (Extended Data Fig. 6a) resulted in a strong pro- 
lymphangiogenic response reflected by the presence of ectopic 
PAC sprouts (Fig. 4e). Finally, Wnt5b induction led to a significant 
recovery in the number of PACs in sox32 morpholino (MO)- 
injected Tg(hsp70l:wnt5b-GFP) embryos (Fig. 4f). Taken together, 
these results highlight the endoderm-secreted Wnt5b as both neces- 
sary and sufficient for lymphatic formation during embryonic 
development. 

To confirm that Wnt5b is specifically required for lymphatic 
specification, and not for general sprouting from the PCV, we 
assessed the number of venous vs arterial intersegmental vessels in 
wnt5b-, and Control MO-injected Tg(flt1_9a_cFos:GFP;fli:dsRed) 
embryos, and found no differences (Extended Data Fig. 6b). 
Likewise, fltI1_9a:GFP*-vPCV progenitors were normally found 
within the subintestinal plexus of wnt5b morphants (Extended Data 
Fig. 6c), confirming that Wnt5b does not inhibit PAC formation by 
unselectively impeding sprouting from the PCV, but rather by affect- 
ing LEC specification. To ascertain whether Wnt5b affects LEC pro- 
liferation, we photoconverted and time-lapse imaged vPCV cells in 
wnt5b MO-injected Te(flil:gal4;uasKaede) (Extended Data Fig. 7a) 
and T¢(flil:nEGFP; flil:dsRed) (Supplementary Video 6) embryos. 
While in control siblings approximately 30% of the vPCV cells 
reached the PACs by 48 hpf (Fig. 2c), they did not engage in dorsal 
migration to generate PACs in wnt5b morphants. Interestingly, 
although the cells were viable and divided normally, the only asym- 
metric division events detected involved cells that migrated ventrally 
to populate the subintestinal vessels (data not shown). In addition, 
ectopic induction of Wnt5b did not result in enhanced endothelial cell 
proliferation (Extended Data Fig. 2b). 

Conclusive evidence supporting a role for Wnt5b as an inducer of 
LEC specification was provided by the analysis of lymphatic marker 
expression following wnt5b downregulation and overexpression. 
In situ hybridization revealed a pronounced reduction in lymph- 
atic-specific transcripts in the PCV of wnt5b morphants, whereas 
the expression of pan-endothelial genes remained unchanged 
(Extended Data Fig. 7b). This phenotype, indicative of a defect in 
lymphatic specification, was not reported following loss of Vegfc, a 
signal specifically required for LEC budding from the PCV””*. In 
addition, the expression of vegfc and ccbe1*° remained unchanged in 
sox32 and wnt5b morphants (Extended Data Fig. 7c), ruling out the 
possibility that Wnt5b controls lymphatic specification through 
activation of these genes. Finally, the number of proxla* cells was 
reduced in wnt5b morphants, and increased following Wnt5b over- 
expression (Fig. 5a). Moreover, Wnt5b activation induced upregula- 
tion of the proxla transcript (Fig. 5b), and the Proxl protein 
(Extended Data Fig. 7d). Taken together these data indicate that 
Wnt5b is mainly required for lymphatic specification, and not migra- 
tion or proliferation, of the vPCV angioblasts. 

Recently, a divergence in the molecular mechanisms controlling 
lymphatic specification in zebrafish and mice was postulated*®. To 
ascertain whether the novel mechanism of LEC specification uncov- 
ered here is conserved among vertebrates, we tested the ability of 
recombinant WNT5B to induce lymphatic specification in human 
embryonic stem cell (hESCs)-derived vascular progenitors*'**. As 
seen in Fig. 5c, WNT5B induced a marked increase in the fraction 
of LYVE1” cells detected in the culture, as well as in the levels of 
PROX1 (Fig. 5d) and FLT4 (Extended Data Fig. 7e) mRNAs, indi- 
cating that the role of Wnt5b as potent inducer of LEC specification 
is evolutionarily conserved. Furthermore, these findings indicate 
that Wnt5b acts directly on vascular progenitors to promote the 
‘angioblast-to-lymphatic’ specification. 
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Figure 5 | Wnt5b induces lymphatic specification in zebrafish and hESCs. 
a, Quantification of Proxla* vPCV cells (light-blue arrowheads) in 
Tg(flil:EGFP;prox1a:KalT4-UAS:uncTagRFP) embryos following Wnt5b 
induction, and downregulation (Mwntsbmo = 7) Mhsp7o:watsb = 95 

nwr = 7; *P = 0.05, **P = 0.001). b, proxla mRNA in 24 hpf 
Tg(flil:EGFP;prox1a:KalT4-UAS:uncTagRFP;hsp70l:wnt5b-GFP) embryos 
following heat shock at 21 hpf (“independent experiments = 4). ¢ Fraction of 
LYVE1" cells, and d, PROX] mRNA levels in hESC-derived angioblasts treated 
with WNT5B (nindependent experiments = 3). Scale bar, 60 jum. Error bars, 

a, mean + s.e.m.; b, d, geometrical mean + standard error of the geometrical 
mean (s.e.g.m.). 


Wnt5b-activated canonical pathway induces LEC 
specification 
We next characterized the downstream components of the Wnt path- 
way involved in lymphatic specification. Wnt5 is mostly referred to as 
a non-canonical Wnt ligand, which can also repress and/or activate 
the canonical pathway in different contexts*’**. It is well-established 
that a key step in the activation of canonical Wnt pathway is the 
inhibition of a destruction complex composed of APC, axin, GSK3- 
6B and other proteins, which results in stabilization and nuclear trans- 
location of cytoplasmic B-catenin*’. We therefore began by analysing 
lymphatic development following manipulation of axin and APC. 
Injection of wnt5b MOs into mbl’”"?!3 (axin1) mutants did not affect 
PAC formation (Extended Data Fig. 8a), confirming the requirement 
of axin downstream of Wnt5b. Likewise, apc” mutants displayed 
significantly increased PAC numbers (Extended Data Fig. 8b), resem- 
bling the Wnt5b overexpression phenotype (Fig. 4e). Conversely, axin 
overexpression (Extended Data Fig. 8c), as well as treatment with 
IWRI, a small molecule shown to lower the levels of B-catenin, 
induced a significant reduction in the number of PAC-containing 
segments (Extended Data Fig. 8d). In contrast to these results, the 
inhibitor of B-catenin-independent Wnt activation, TNP-470, did not 
cause any detectable lymphatic defects (Extended Data Fig. 9a). We 
then investigated the role of the TCF/LEF transcription factors*® in 
early lymphangiogenesis. Downregulation of tcf4, tcf7, lef1 and tcf3b 
(Extended Data Fig. 9b, c and data not shown), resulted in reduced 
number of PACs with an otherwise normal blood vasculature. In line 
with the phenotypes resulting from Wnt5b downregulation, arterial/ 
venous differentiation was not impaired in these morphants 
(Extended Data Fig. 9b), and photoswitched vPCV cells did not 
migrate dorsally to generate PACs (Extended Data Fig. 9d). 
Together, these results indicate that induction of lymphatic specifica- 
tion by Wnt5b occurs primarily through B-catenin/TCF activation. 
The lymphatic defects derived from Wnt/B-catenin inhibition 
could be secondary to global Wnt5b-signalling depletion. Alterna- 
tively, they could reflect a cell-autonomous requirement for Wnt- 
signalling within prospective LEC progenitors. To distinguish between 
these two possibilities, we assessed B-catenin/TCF activity within 
vPCV-angioblasts using Tg(flil:EGFP;7xTCF-Xla.Siam:nlsmCherry)'” 
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Figure 6 | Wnt5b induces LEC specification through activation of the 
canonical pathway. a, Tg(7xTCF-Xla.Siam:nlsmCherry;flil:EGFP) embryo 
showing TCF activity in vPCV cells at 24 hpf (light-blue arrowheads), and PACs 
at 48 hpf (white arrows). b, Tg(7xTCF-Xla.Siam:nlsmCherry;flt1_9a_cFos:GFP) 
embryo showing TCF activity in flt1_9a* vPCV angioblasts at 24 hpf 
(light-blue arrowheads). c, Kalt4 mRNA levels in Tg(proxla:KalT4-UAS: 
uncTagRFP;hsp70l:wnt5b-GFP) 24 hpf embryos, following heat shock at 21 hpf 
(independent experiments = 3). d, Schematic model of LEC specification and 
formation of first lymphatic vessels in the zebrafish trunk. Scale bars, 30 jim. 
Error bars, geometrical mean + s.e.g.m. 


double transgenic embryos. As seen in Fig. 6a, TCF activity was 
detected in these cells at 24 hpf, and in PACs at 48 hpf. Furthermore, 
time-lapse imaging revealed that only vPCV angioblasts with active 
B-catenin/TCF undergo asymmetric cell division and generate PACs 
(Extended Data Fig. 10a). Moreover, these cells were also flt1_9a:GFP* 
(Fig. 6b). The number of B-catenin/TCF* vPCV-angioblasts was sig- 
nificantly reduced following wnt5b downregulation (Extended Data 
Fig. 10b, c), confirming that the B-catenin/TCF activity detected in 
these cells was Wnt5b-dependent. Altogether, our results analysing 
B-catenin/TCF activity in LEC progenitors in vivo, in combination with 
LEC specification in cultured hESCs, indicate that Wnt5b-dependent 
activation of B-catenin is cell-autonomously required within vascular 
progenitors for proper lymphatic specification, and highlight Prox] as 
one of the major downstream targets of Wnt5b. 

The changes in prox] mRNA levels observed in zebrafish and 
hESCs (Fig. 5b, d) could result from either transcriptional regulation, 
or post-transcriptional modifications that alter RNA stability of the 
prox1 transcript. To distinguish between these two possibilities we 
took advantage of the TgBAC(prox1a:KalT4-UAS:uncTagRFP) zebra- 
fish reporter (Fig. 3 and ref. 26), in which the KalT4 fragment reca- 
pitulates the transcriptional activation of the endogenous proxla 
promoter, without being subjected to the post-transcriptional mod- 
ifications of the proxla gene (the KalT4 cassette possess its own 3’ 
untranslated repeat). We hypothesized that if Wnt5b transcription- 
ally regulates proxla mRNA, overexpression of Wnt5b will result in a 
significant increase in the levels of KalT4 mRNA. If in turn, proxla 
upregulation involves alterations in its mRNA stability, the levels of 
KalT4 mRNA will remain unchanged upon hsp70:Wnt5b activation. 
As seen in Fig. 6c, overexpression of Wnt5b results in elevated levels of 
the KalT4 transcript. Although we cannot exclude the possibility that 
post-transcriptional modifications are also involved in proxla regu- 
lation, our results strongly support a mechanism involving transcrip- 
tional regulation of proxla in response to Wnt5b. Whether this is a 
direct or indirect regulation remains to be elucidated. 


Discussion 


Development and regeneration of multicellular organisms rely on the 
ability of competent cells to respond to different signalling inputs that 
specify cell fate. The results presented here identify for the first time a 
pool of specialized angioblasts within the floor of the posterior car- 
dinal vein that bears the potential to generate arterial, venous and 
lymphatic fates. Anatomically, these cells develop in close proximity 
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to the endoderm, which serves as source of Wnt5b, a novel inductive 
signal promoting the angioblast-to-lymphatic transition (Fig. 6d). 
Interestingly, the time-frame of induction of lymphatic fate in the 
vPCV angioblasts fully overlaps with the endodermal expression of 
the Wnt5b ligand, highlighting a tight spatiotemporal regulation of 
cell differentiation within this niche. 

Recently, a divergence in the molecular mechanisms controlling 
lymphatic specification in zebrafish and mice was suggested”*. The 
finding that Wnt5b functions as a potent inducer of lymphatic cell 
fate, both in zebrafish and in hESC-derived vascular progenitors, 
provides compelling evidence for a strong conservation of this path- 
way among vertebrates. 

Previous reports have postulated an arterial origin for the cardinal 
vein, both in zebrafish*! and mammals”. Here we show that cells 
expressing arterial/angioblast markers within the posterior cardinal 
vein are those that generate lymphatic progenitors during embryonic 
development. However, it is striking that these cells do not originate in 
the dorsal aorta, but rather migrate directly from the lateral plate 
mesoderm to populate the ventral wall of the posterior cardinal vein, 
retaining their ‘multipotent’ capacities. 

Altogether our results highlight the posterior cardinal vein as a 
highly heterogeneous structure containing different cell populations, 
thereby challenging the current view of a ‘strict’ venous origin for 
lymphatic vessels. Our findings help settle a century-old controversy 
regarding the origin of the lymphatic endothelium by providing evid- 
ence for a novel mechanism, which reconciles the models proposed by 
Sabin* and Huntington & McClure*. On the one hand lymphatic 
endothelial cells do emerge from veins; however, they do so by an 
unexpected mechanism involving a venous niche of specialized meso- 
derm-derived angioblasts. These findings open a whole set of new 
questions regarding the formation of lymphatic vessels during disease 
states and regeneration. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Zebrafish husbandry and transgenic lines. Zebrafish were raised by standard 
methods’* and were handled according to the Weizmann Institute Animal Care 
and Use Committee. The pieg!” 1 (ref. 18), Tg(flil :EGFP)” (ref. 19), Te(flil:nEGEP)Y” . 
(ref. 4), To(flil :dsRed)“""? (ref. 23), Te(flil:gal4 4bs3.acKaede™*) (ref. 17), 
Tel hsp70l:wnt5b-GFP)"*? (ref. 38), Tal 7xTCF-Xla.Siam:nlsmCherry)> (ref. 39), 
cas (ref. 40), ppt (ref. 41), mbi" (ref. 42), apc” (ref. 43), 
Te(lyvel :dsRed2)"""" (ref. 24), and TgBAC(prox1a:KalT4- UAS:uncTagRFP)””"” 
(ref. 26) were previously described. The Tg(flt1_9a_cFos:GFP)"” reporter was gen- 
erated by cloning the previously identified zebrafish flt1_9a enhancer“, into 
pGW_cFosGFP“*, The Te(kdrl:Kaede)””’ was generated by cloning a Kaede frag- 
ment in a Tol2-compatible vector containing 2.5kb from the kdrl promoter using the 
Gateway methodology**. 

In situ hybridization and immunofluorescence. Whole-mount in situ hybrid- 
ization was carried out as described’* using flt4, ccbe1*°, sox18” and cdh5"° anti- 
sense mRNA probes. The lyvel (5’-AGACGTGGGTGAAATCCAAG-3’ and 
5'-GATGATGTTGCTGCATGTCC-3’), wnt5b (5'-ATGGATGTGAGAATGA 
ACCAAGGAC-3' and 5’'-CTACTTGCACACAAACTGGTCTACG-3’), and 
vegfc (5'-CATCAGCACTTCATACATCAGC-3’ and 5’-GT'CCAGTCTTCC 
CCAGTATG-3’) probes were amplified by PCR from 24 hpf complementary 
DNA. A fragment (1,269 bp) flanking the (5’-GTACAAAAAAGCAGGCT 
CCGCGGCC-3’...5'-TCATCAGGGATATGTTGCTGTCGGG-3’) — sequence 
of the nr2f2 gene was cloned into Pcs2 plasmid, and linearized using NotI. 
Embryos were imaged using a Leica M165 FC imaging system. 

Phospho-histone staining was carried out as described“ using p-histone H3 
antibody (1:300) (Santa Cruz). 

For detection of Proxl protein embryos were fixed overnight in 4% para- 

formaldehyde, washed in 100% methanol, incubated 1 h on ice in 3% HO, in 
methanol, washed in 100% methanol and stored at —20°C. Embryos were then 
permeabilized in wash buffer (PBS/0.1% tween/0.1%Triton), blocked in 10% goat 
serum/1%BSA in wash buffer for 5 h at 4°C, and incubated with Prox] antibody 
(1:750) overnight*. Samples were then washed 5 times with wash buffer, followed 
by washes with maleic buffer (150 mM maleic acid/100 mM NaCl/0.001% Tween 
20 pH 7.4 saturated with 10 N NaOH), blocking in maleic buffer containing 2% 
blocking reagent (Roche), and incubation overnight at 4°C with goat anti rabbit 
IgG-horseradish peroxidase (Jackson 1:500) for TSA signal amplification. 
Following washes with maleic buffer and PBS, samples were incubated for 3 h 
with TSA Plus Cyanine 3 reaction (Perkin Elmer) and washed with wash buffer 
several times through 1-2 days at room temperature. 
Manipulation of zebrafish embryos. Heat-shock, TNP-470 and IWR1 treat- 
ments. 24-26 hpf Tg(hsp70l:wnt5b-GFP) embryos were heat-shocked at 37°C 
for 25 min and analysed for PAC number at 56 hpf. For Prox] immunostaining 
Tg(hsp70l:wnt5b-GFP) embryos were heat-shocked at 19-20 hpf for 25 min and 
fixed as described above at 28 hpf. For qRT-PCR analyses Tg(prox1:KalT4- 
UAS:uncTagRFP;hsp70l:wnt5b-GFP) embryos were heat-shocked at 21 hpf, for 
25-30 min. 

IWR1 (ref. 49) (Sigma) and TNP-470 (Sigma) were dissolved in dimethyl 
sulfoxide (DMSO) as previously described*’. Embryos were treated with 30 1M 
IWR1, for 2 days starting at 20 hpf. TNP-470 was added at a concentration of 25 
uM, for 2 days starting at 23 hpf. PAC formation was assessed at 3 dpf. 
Morpholino injection. The following antisense morpholino oligonucleotides were 
used: sox32 (ref. 51) (1 ng), tcf7 (ref. 52) (8 ng), lefl (ref. 53) (3 ng), wnt5b (ref. 41) 
(7.5 ng or 4 ng for subdose), vegfc (ref. 4) (5 ng), tef4 (5'-CTGCGGCATTTT 
TCCCGAGGAGCGC-3’) (8 ng), control MO (5'-CCTCTTACCTCAGTT 
ACAATTTATA-3’) (8 ng). MOs (Gene-tools) were resuspended and injected 
as described’. 

DNA and mRNA injection. axin mRNA™ (260 pg) was injected at 1-cell stage. To 
generate the Tg(flt1_9a_cFos:GFP), and Tg(kdrl:Kaede) transgenic lines, ~30 pg 
plasmid were injected along with 30 pg of Tol2 transposase mRNA into 1-cell 
stage embryos. 

Quantitative real-time PCR (qRT-PCR). qRT-PCR was carried out as prev- 
iously described’® using the following primers: 

Zebrafish: proxla (5’-AATCCAAGAGGGGCTTTCGC-3’ and 5’-TGCAGCGG 
TTAAACTTCACG-3’), KaltA4 (5'-GACGCTGTGACAGACCGATT-3’ and 
5'-CAGCTGTCTCTGTCCCTTGT-3’), bactin2'®, etv2 (5'-TACCCAGGATCT 
GGACCCAT-3’ and 5’-CAGCCATCACCAGTCCAACT-3’), frz7a (5'-TGTCT 
CGTGCGGACTGTTAC-3’ and 5'-CACTGTTCATGAGGCTCCGT-3’), nr2f2 
(5'-ACAGAGTGGTCGCCTTTATGG-3’ and 5’-CCACACGCATCTGAAGT 
GAA-3’). Human: PROX1 (5’-CCACTGACCAGACAGAAGCA-3’ and 5'-TG 
GGCTCTGAAATGGATAGG-3’), beta-actin (5’-TCCACCTTCCAGCAGAT 
GTG-3' and 5'-GCATTTGCGGTGGACGAT-3’), FLT4 (5’-AAGAAGTTCCA 
CCACCAAACAT-3’ and 5’-TGAAAATCCTGGCTCACAAGC-3’) and CDH5 
(5'-AACTTCCCCTTCTTCACCC-3’ and 5'-AAAGGCTGCTGGAAAATG-3’). 


Scoring and quantification of phenotypes. To assess the contribution of dorsal 
vs ventral PCV to different vascular beds, single EC, or pan-Kaede photoconver- 
sion was carried out in T¢(flil:gal4;uasKaede) embryos. Photoswitching was per- 
formed using a 405 nm laser. To assess the contribution of medial vs early lateral 
angioblasts to PACs, endothelial cells in 4 segments of Tg(kdrl:Kaede) embryos 
were photoswitched at 17-18 hpf and 20-21 hpf, respectively. 28 h later, embryos 
were imaged, and the number of green vs red PACs was counted in 6 segments 
over the photoswitched area. 

For quantification of phenotypes, the average number of PACs or thoracic 
duct per cell number in 9-10 segments over the yolk extension was calculated. 
Embryos with no fluorescence or with gross vascular morphological defects were 
excluded from quantification. For analysis, embryos that meet all the criteria 
above were randomly selected. For quantification of PAC related phenotypes 
in ppt mutants, a subdose of wnt5b MO (4 ng) was injected into ppt embryos 
to abolish maternal RNA contribution as described”. 

Imaging. Confocal imaging was performed using a Zeiss LSM 780 upright con- 
focal microscope (Carl Zeiss, Jena, Germany) with a W-Plan Apochromat X20 
objective, NA 1.0. Fluorescent proteins were excited sequentially with single- 
photon lasers (488 nm, 563 nm). Two-photon imaging of GFP was carried out 
at 920 nm. Time-lapse, in-vivo imaging was performed as previously described”® 
using a custom-built chamber for perfusion of embryos with temperature- 
controlled physiological medium. z-stacks were acquired at 2.5-3 Jum increments, 
every 10-12 min. 

Embryo dissociation, fluorescence activated cell sorting (FACS) and RNA 
sequencing. Following pan-Kaede photoconversion of dorsal, or ventral PCV 
in Tg(flil:gal4;uasKaede) embryos at 24 hpf, 6 embryos per group were used for 
FACS isolation of Kaede photoconverted (red) endothelial cells. Single-cell sus- 
pensions were prepared as described'® with some modifications (the embryos 
were not chopped, and no Liberase was used). Sorting was performed at 4°C in 
a FACSAria cell sorter using a 70-j1m nozzle. Photoconverted (red) endothelial 
cells were collected in 1 ml PBS, washed with PBS and centrifuged twice at 300g, 
at 4°C for 5 min. Total RNA was extracted using Tri@Reagent (Sigma) as 
described”, except that only GenElute-LPA (Sigma) was added to help precipitate 
the RNA. RNA-Seq was performed as previously described’* with the following 
modification: a new set of primers was used with a shorter barcode, and a 5-base 
unique molecule identifier to enable transcript counting. 

RNA sequencing data analysis. CEL-Seq data was normalized by dividing the 
reads of each gene by the total reads of the sample and multiplying by 10,000 
(transcript per 10,000). Genes without any detected expression, or with express- 
ion detected only in one sample were filtered out. For identification of significant 
differentially expressed genes, fold change was calculated and a two-sample t-test 
was conducted. 

Gene Ontology analysis was performed with Ontologizer 2.0°° using the 
Topology-Weighted algorithm on the set of genes with a change of at least 1.5 
fold between the ventral and dorsal samples. The associations were taken from 
geneontology.org, Version 1.4 from ZFIN. 

Image processing. Images were processed off-line using ImageJ (NIH) and 
Imaris (Bitplane). Selected data sets were deconvoluted with Autoquant X3 
(Media Cybernetics). For co-localization analyses confocal images were first 
deconvoluted and then analysed using the Imaris ‘Colocalization Module’. We 
used this new channel to mark, and manually count cells that are labelled with 
both EGFP and mCherry/TagRFP fluorophores. Co-localization thresholds and 
nuclei quantification were set manually. Where necessary, movies were registered 
with the “Linear Stack Alignment with SIFT” plugin of FIJI. 

Histology. Tg(flil:EGFP) embryos were fixed in 4% PFA for 20 min at room 
temperature, embedded in gelatin-bovine albumin medium (0.35% gelatin, 21% 
bovine albumin) as previously described*. 50-100 fm cross-sections were 
obtained using a Leica VT 1000s vibratome and stained in 1:200 dilution of 
TRITC-Phalloidin (Sigma) as previously described”. 

Human embryonic stem cells. Induction of differentiation towards the endothe- 
lial lineages has been previously described’. Briefly, H9 cells were seeded as single 
cells on Collagen-IV (Sigma) coated plates at 5 X 10° cells per cm? and cultured 
with MEM-alpha (Invitrogen), 10%FBS (Hyclone) and 0.1 mM £-mercaptoeth- 
anol for 6 days. At day 6, cells were re-seeded on collagen-IV coated plates 
at 1.25 X 10* cells per cm? and cultured under ECGM (Promocell)-+20% EBS, 
50 ng ml _' VEGE-A (Biolegend Inc., San Diego CA) and 10 1M $B431542 
(Sigma Aldrich). To induce lymphatic differentiation cells were added with 
100 ng ml’ Wnt5b (R&D) starting from day 6 every other day. At day 12 
RNA was extracted with Tri@Reagent (Sigma)/Chlorophorm, and cDNA was 
produced using the SuperscriptIII kit (Invitrogen). 

H9 cells were obtained and handled by the Stem Cells Research Center at the 
Weizmann Institute (Israel), and were routinely checked for karyotype and for 
mycoplasma contamination. 
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For FACS analyses cells were harvested using non-enzymatic dissociation 
solution (Sigma), stained with an allophycocyanin-conjugated Lyvel Antibody 
(R&D systems, Minneapolis, MN) for 30 min at room temperature, washed with 
PBS 3% FCS, stained with propidium iodide (Sigma) and analysed via 
FACSarialll. Dead cells were excluded from analysis by gating out propidium 
iodide-positive cells. 

Statistical analyses. No statistical methods were used to predetermine 
sample size. 

Data was analysed using the unpaired two-tailed Student’s t-test assuming 
unequal variance from at least two independent experiments, unless stated other- 
wise. In all cases normality was assumed and variance was comparable between 
groups. Sample size was selected empirically following previous experience in the 
assessment of experimental variability. The investigators were not blinded to 
allocation during experiments and outcome assessment. We chose the adequate 
tests according to the data distribution to fulfil test assumptions. Numerical data 
are the mean = s.e.m., unless stated otherwise. 

For qRT-PCR experiments we computed standard error for each fold-change. 
For genes with more than a single fold-change value, x), x2, ..., X,» each with a 
standard error Ax,, Ax, ..., Ax,, we computed the mean fold-change by taking 
the geometrical average, X = y/X1"Xq"...°Xy. Its standard error was computed using 


z 2 2 2 
error propagation, Ax :,/(+2) + (42) bed (%) . 
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Extended Data Figure 1 | Mesoderm-derived angioblasts generate LECs 
through asymmetric cell division. a, Snapshots from a time-lapse sequence 
of a Tg (flilnEGFP)” zebrafish embryo, showing the origin of a PAC cell 
(yellow) in the vVPCV (“imaged embryos = 7). b, VPCV (left panel), and dPCV 
(right panel) Kaede photoconverted cells at 48 hpf. c, Kaede-photoswitched 
‘medial’ (left panel) and ‘early lateral’ (right panel) angioblasts. d, Snapshots 
from a time-lapse sequence of a plcg1 mutant, showing the origin of a 

PAC cell (green) in the vVPCV (imaged embryos = 3)- €; Quantification of 
symmetric and asymmetric division events in the VPCV and dPCV of double 
Tg(flt1_9a_cFos:GFP; lyve1:dsRed2 maior embryos (Himaged embryos = 6). Scale 
bars, 30 um. 
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enhanced proliferation of endothelial cells (26 hpf neontrot embryos = 15, 
vessels. a, Phospho-histone H3 staining shows no difference in the number of — Mysp7o.wntsb embryos = 8, 28 hpf; Montrol embryos = 14, Mhsp7o:wntSb embryos = 
proliferative endothelial cells among the DA, dPCV and vPCV (2a hptembryos = 8; 30 pf Ncontrol embryos = 14; Mhsp7o:wntsb embryos = 10). Scale bar, 60 um. 
17, 126 np£ embryos = 16, N28 hpf embryos = 16, 1130 np£ embryos = 16). b, Ectopic Error bars, mean + s.e.m. 
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Extended Data Figure 2 | Analysis of cell division in the zebrafish axial 
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Extended Data Figure 3 | Fate map analysis of vPCV cells. a, Schematic 
representation of the subintestinal plexus at 72 hpf. Subintestinal vein (SIV, 
green), interconnecting SI vessels (purple), supraintestinal artery (SIA, pink), 
posterior cardinal vein (PCV, blue), dorsal aorta (DA, red). b, Quantification of 
the number of intersegmental arteries (ISA) and intersegmental veins (ISV) 
in the first four segments of Tg(flt1_9a_cFos:GFP; lyvel:dsRed2) double 
transgenic embryos (Nembryos = 41). IS# denotes the position of 
intersegmental vessel. c, Confocal images of Tg(lyvel:dsRed2) (left panel) 

and T¢(flt1_9a_cFos:GFP; lyvel:dsRed2) (right panel) embryos showing 
lyvel:dsred2* endothelial cells in PACs, venous intersegmental vessels (ISVs), 
PCV and SIV and (fitl_9a:GFP* endothelial cells in the SIA. d, (fitl_9a:GFP* 
vPCV angioblast (light-blue arrowhead), divides asymmetrically (curved 
arrow) to generate cells that populate the SIV (31.5 hpf, white arrowhead), 
and the SIA (53.5 hpf, white arrowhead). Scale bar, 30 um. Error bars, 

mean + s.e.m. 


©2015 Macmillan Publishers Limited. All rights reserved 


dorsal switch 
ventral switch 


FACS isolation 
SC Ks) >) 
fe ml ¥ etv2 — fad7a_—mr2f2 
° . ] 
2 2 8 
Kaede photoswitched ECs 
d gene ontology enrichment in vPCV cells 
angiogenesis 
RNA seq endothelial cell migration 


lymph vessel development 

I notch signaling pathway 
vasculogenesis 

tube morphogenesis 

arterial endothelial cell fate commitment 
sphingosine-phosphate signaling pathway 
canonical Wnt receptor signaling 

hematopoietic stem cell migration 

negative regulation of ERK1 and ERK2 cascade 


relative expression 
ee ae 


005 115 2 25 3 35 4 45 5 


P value (-log,,) 


Extended Data Figure 4 | Transcriptional profiling of vPCV angioblasts. 
a, Experimental setup used for RNA sequencing analysis of FACS isolated 
vPCV and dPCV cells. b, FACS isolation of green vs red (photoconverted) 
endothelial cells from Tg(flil:gal4;uasKaede) embryos following 
photoswitching of dorsal or ventral PCV (nindependent experiments = 4). ©, 
qRT-PCR analysis of selected candidates shows enrichment in ventral vs dorsal 
PCV cells (“independent experiments = 2). d, Gene Ontology enrichment in vPCV 
vs dPCV cells (results represent 2 out of 4 independent biological repeats). 


Error bars, geometrical mean + s.e.g.m. 
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Extended Data Figure 5 | Endoderm-derived Wnt5b is required for 
lymphatic development. a, PAC-containing segments in WT (arrows) and 
cas mutants (asterisks). b, In situ hybridization at 20 hpf showing expression of 
wnt5b mRNA (blue arrowhead) in the endoderm of WT embryos. ¢, PAC- 
containing segments in uninjected (UI) (arrows) and wnt5b MO-injected 
embryos (asterisks). d, ppt mutants injected with wnt5b MO (subdose) display 
significant reduction in PAC number (71u1 embryos = 38, Nwntsb-MO embryos sub = 
38, Nppt-UI embryos — 34, "ppt, wntSb MO sub-embryos — 34; *P= 1.2 X 10 °). 

e, wnt5b morphants exhibit marked reduction in the number of thoracic duct- 
containing segments (asterisks) as compared to uninjected (UJ) siblings 
(arrows) (ut-embryos = 38; Myntsb MO-embryos = 32; *P = 4.5 X 10-**), f, The 
number of flt1 vPCV progenitors is not affected in wnt5b morphants 
(AULembryos = 31, Mwntsb-MO embryos = 31). Scale bars, a, c, 60 um; b, e, f, 30 jum. 
Error bars, mean + s.e.m. 
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Extended Data Figure 6 | Wnt5b is not required for sprouting from the 
PCV. a, Phenotypic analysis of Wnt5b overexpression in Tg(hsp70I:wnt5b- 
GEP; lyvel:dsRed2) embryos, following 25-30 min heat shock (HS), at 23, 25 
and 27 hpf (23 hpf embryos nys_25 min = 18, Mys-30 min = 14, Mys-40 min = 15, 25 
hpf embryos nyg_25 min = 145 Mys-30 min = 17, MHs-40 min = 20, 27 hpf embryos 


Nys-25 min = 19, Nys-30 min = 17, Mys-40 min = 10). b, The number of vISVs vs 
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aISVs is unaltered in wnt5b morphants as compared to Control MO-injected 
siblings (MControl MO-embryos — 43, NwntSb MO-embryos — 41). c, fltl_9a* vPCV 
cells are detected in the supraintestinal artery (SIA) and subintestinal vein (SIV) 
of wnt5b MO-injected embryos (“ct Mo = 16, Mwntsb Mo = 16). Scale bars, 
60 um. Error bars, mean + s.e.m. 
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Extended Data Figure 7 | Wnt5b induces the “angioblast-to-lymphatic” 
specification. a, Selected frames from a time-lapse sequence of a 
Tg(flil:gal4;uasKaede) embryo injected with wnt5b MO. Photoconverted vPCV 
cell (white arrow) divides normally (arrows at 48 hpf point to 2 daughter cells), 
but does not engage in dorsal migration to generate PACs. b, In situ 
hybridization of Ctrl MO-, and wnt5b MO-injected zebrafish at 30 hpf, with 
lyvel, sox18, nr2f2 and cdh5 probes, showing specific decrease in lymphatic 
marker expression in the floor of the PCV (white arrowheads) of wnt5b 
morphants. The pan-endothelial marker cdh5, as well as the arterial expression 
of sox18, remain unchanged in wnt5b morphants. c, vegfc and ccbel mRNA 
levels remain unaltered in sox32 and wntSb morphants. d, Immunostaining of 
Prox1 shows marked increase in protein levels following ectopic activation of 
Wnt5b in Tg(hsp70l:wnt5b; flil:EGFP) embryos (co-localization channel is 
shown in yellow, white arrowheads). e, RT-PCR analysis of FLT4 and CDH5 
in hESCs treated with WNTS5B (nindependent-experiments = 3; *P = 0.03 by one 
sample t-test). Scale bars, 60 tum. Error bars, geometrical mean + s.e.g.m. 
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Extended Data Figure 8 | Wnt5b induces LEC specification through 
activation of canonical pathway. a, PAC-containing segments (arrows) 

in wnt5b MO-injected mbl mutants (Mwntsbmo = 423 MmblwntsbmMo = 525 
*P=3.4x10 )).b, apc mutants (nw = 18, Mapc = 193 *P = (0.0006), c, axin1 
mRNA- injected embryos (nyy = 33; Naxin-mRNa = 46; *P = 1.73 X 10 14), and 
d, IWR-1 treated embryos ("piso = 55; Mrwr = 54; *P = 1.05 X 10 *). 
Scale bars, 60 jim. Error bars, mean + s.e.m. 
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Extended Data Figure 9 | Involvement of Tcf transcription factors in LEC 
specification. a, PAC number remains unchanged in TNP-470 treated 
Tg(flil:EGEP) embryos as compared to DMSO (control) (“piso = 19; Mrnp-470 
= 38). b, c, Quantification of PAC-containing segments in the trunk of UI, tcf7, 
lefl and tcf4 MO-injected embryos (nut-embryos = 595 Mtct7-MO embryos = 33, 
Mief1-MO embryos = 16, Micft-MO embryos = 25; *P = 4.53 X 10”, 

**P = 9.62 X 10 8, ***P = 9.12 X 10 °). d, Photoswitching of vPCV cells 
in tcf7 MO-injected Tg(flil:gal4;uasKaede) embryos (white arrowheads) at 

24 hpf. At 48 hpf photoconverted, red vPCV cells (arrowheads) remain 

in the PCV and do not generate PACs. Scale bars, 30 um. Error bars, 

mean = s.e.m. 
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Extended Data Figure 10 | Wnt5b-dependent activation of B-catenin/TCF 
in vPCV angioblasts. a, Selected frames from a time-lapse sequence showing 
B-catenin/TCF activity in a single vPCV angioblast (light-blue arrowhead), 
which generates PACs (white arrowhead) through asymmetric cell division 
(n = 2). b, Confocal images of the trunks of Tg(7xT'CFXla.Siam:nlsmCherry; 
(flil:EGFP) double transgenic zebrafish injected with wnt5b MO, 

showing decreased B-catenin/TCF activation in vPCV cells (quantified in 

C) (NuUtembryos = 18, Nwntsb-embryos = 17; *P = 4 X 10°). Purple signal depicts 
co-localization of cytoplasmic EGFP and nuclear mCherry. Scale bars, 30 jum. 
Error bars, mean + s.e.m. 
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Resonant interactions and chaotic 
rotation of Pluto’s small moons 


M. R. Showalter! & D. P. Hamilton? 


Four small moons—Styx, Nix, Kerberos and Hydra—follow near-circular, near-equatorial orbits around the central 
‘binary planet’ comprising Pluto and its large moon, Charon. New observational details of the system have emerged 
following the discoveries of Kerberos and Styx. Here we report that Styx, Nix and Hydra are tied together by a 
three-body resonance, which is reminiscent of the Laplace resonance linking Jupiter’s moons Io, Europa and 
Ganymede. Perturbations by the other bodies, however, inject chaos into this otherwise stable configuration. Nix and 
Hydra have bright surfaces similar to that of Charon. Kerberos may be much darker, raising questions about how a 
heterogeneous satellite system might have formed. Nix and Hydra rotate chaotically, driven by the large torques of the 


Pluto-Charon binary. 


Pluto’s moon Kerberos (previously designated S/2011 (134340)1 or, 
colloquially, P4) was discovered in 2011’ using images from the 
Hubble Space Telescope (HST). It orbits between the paths of Nix 
and Hydra, which were discovered in 2005 and confirmed in 2006’. 
Follow-up observations in 2012 led to the discovery of the still smaller 
moon Styx (S/2012 (134340)1 or P5)°. The complete data set includes 
numerous additional detections of both objects from 2010-2012**, 
plus a few detections from 2005 (H. A. Weaver, personal commun- 
ication, 2011) and from 2006’; see Supplementary Table 1. Figure 1 
shows samples of the available images. Motivated by these discoveries, 
we investigate the dynamics and physical properties of Pluto’s four 
small outer moons. 


Orbits 


Pluto and Charon comprise a ‘binary planet’—two bodies, similar in 
size, orbiting their common barycentre. Their mutual motion creates 
a time-variable and distinctly asymmetric gravity field. This induces 
wobbles in the orbits of the outer moons and also drives much slower 
apsidal precession and nodal regression®. In our analysis, we ignore 
the short-term wobbles and derive time-averaged orbital elements. 
This is equivalent to replacing the gravity field by that of two con- 
centric rings containing the masses of Pluto or Charon, each with a 
radius equal to that body’s distance from the barycentre. 

We have modelled the orbits using six Keplerian orbital elements 
(semimajor axis a, eccentricity e, inclination i, mean longitude at 
epoch Ao, longitude of pericentre ao, and ascending node 20) plus 
three associated frequencies (mean motion n, nodal precession rate w, 
and apsidal regression rate 2). We work in the inertial Pluto-Charon 
(P-C) coordinate frame, with Pluto and Charon in the x-y plane and 
the z axis parallel to the system’s angular momentum pole (right 
ascension 8 h 52 min 5.5 s, declination —6.218°)°. We have solved 
for these elements and frequencies under a variety of assumptions 
about how they are coupled (Extended Data Table 1). Table 1 lists the 
most robustly determined elements, in which we enforce a relation- 
ship that ensures w ~ — Q; this allows us to fit eight elements rather 
than nine. We prefer this solution because root-mean-square (RMS) 
residuals are nearly the same as for the solution where cw and Q are 
allowed to vary independently. Additional possible couplings, invol- 
ving a and nas well, markedly increase the residuals for Styx and Nix; 


this suggests that non-axisymmetric gravitational effects, which are 
not modelled by our concentric ring approximation, can be import- 
ant. The statistically significant (P-value < 1%) ~100-km residuals of 
Nix and Hydra (Table 1) match the predicted scale of the un-modelled 
wobbles®, and so are to be expected. 

Table 1 shows that e and i are distinctly non-zero; this was not 
apparent in prior work, which employed a different coordinate frame” 
or was based on 200-year averages®. Our results describe each moon’s 
motion during 2005-2012 more accurately. Variations in n, e and iare 
detectable during 2010-2012 (Extended Data Fig. 1), illustrating the 
mutual perturbations among the moons that have been used to con- 
strain their masses°. 


Search for resonances 


Pluto’s five moons show a tantalizing orbital configuration: the ratios 
of their orbital periods are close to 1:3:4:5:6'*°”. This configuration is 
reminiscent of the Laplace resonance at Jupiter, where the moons Io, 
Europa, and Ganymede have periods in the ratio 1:2:4. Table 1 shows 
the orbital periods P of the moons relative to that of Charon, con- 
firming the near-integer ratios. However, with measured values 
for @ and Q in addition to n, it becomes possible to search for 
more complicated types of resonances. A general resonance 
involves an angle ®= }> (pj; + qjaj+1j,Q;) and its time derivative 


: j 
D= YD (pjnj+ qyuoj + 7Q)) . Here, (p;, qj r;) are integer coefficients and 
j 


each subscript j is C, S, N, K or H to identify the associated moon. A 
resonance is recognized by coefficients that sum to zero and produce a 
very small value of ®; in addition, the resonant argument @ usually 
librates around either 0° or 180°. 

Using the orbital elements and their uncertainties tabulated in 
Table 1, we have performed an exhaustive search for strong reso- 
nances in the Pluto system. One dominant three-body resonance 
was identified: 6 = 3A, — 5Ayn + 2Aq ~ 180°. This defines a ratio 
of synodic periods: 3Syy = 2Ssy, where the subscripts identify 
the pair of moons. We find that @= —0.007 + 0.001° per day and 
that ® decreases from 191° to 184° during 2010-2012; this is all 
consistent with a small libration about 180°. Note that this 
expression is very similar to that for Jupiter's Laplace resonance, 
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Table 1 | Derived properties of the moons 


Figure 1 | Example HST images of Pluto’s small 
moons. a, Kerberos (K) detected 18 May 2005, in 
the Nix/Hydra discovery images. b, Kerberos in the 
Nix (N) and Hydra (H) confirmation images of 2 
February 2006. c, A marginal detection of Styx (S), 
along with Kerberos, on 2 March 2006. d, All four 
moons, 25 June 2010. e, The Kerberos discovery 
image, 28 June 2011, with Styx also identified. 

f, The Styx discovery image, 7 July 2011. All images 
were generated by co-adding similar images and 
then applying an unsharp mask to suppress the 


glare from Pluto and Charon. 


Property Styx Nix Kerberos Hydra 

a (km) 42,656 + 78 48,694 + 3 57,783 + 19 64,738 + 3 

do ©) 276.856 + 0.096 63.866 + 0.006 94.308 + 0.021 197.866 + 0.003 
n(° per day) 17.85577 + 0.00024 14.48422 + 0.00002 11.19140 + 0.00005 9.42365 + 0.00001 
e (10%) 5.787 + 1.144 2.036 + 0.050 3.280 + 0.200 5.862 + 0.025 
wo (°) 296.1+9.4 2216+1A4 187.6 + 3.7 192.2+0.3 

oo (° per day) 0.506 + 0.014 0.183 + 0.004 0.115 + 0.006 0.070 + 0.001 
i@) 0.809 + 0.162 0.133 + 0.008 0.389 + 0.037 0.242 + 0.005 
20°) 183.4 + 12.5 3.7+34 225.2+54 189.7 + 1.2 

2° per day) —0.492 + 0.014 —0.181 + 0.004 —0.114 + 0.006 —0.069 + 0.001 

P (days) 20.16155 + 0.00027 24.85463 + 0.00003 32.16756 + 0.00014 38.20177 + 0.00003 
P/Po 3.156542 + 0.000046 3.891302 + 0.000004 5.036233 + 0.000024 5.980963 + 0.000005 
RMS (a) 1.44 2.59 1.27 2.77 

RMS (mas) 17.8 4.22 11.2 3.21 

RMS (km) 397 94 248 72 

A (km?) 14+4 470 +75 29+8 615 +55 

R100 (km) 2.1403 12.2+1.0 3.0+04 14.0+0.6 

R3g (km) 34+05 198+16 49 +07 227+1.0 

Roe (km) 86+1.2 50+4 124+17 57 +3 
100/b100 2.1+06 1.7+0.6 
bioo/Ci00 1.2+0.2 1.2+0.2 
boo10 (°) 25+10 39+ 16 
2011 °) 37+15 46+18 
2012 °°) 46+17 38+ 16 

Vio0 (km?) 39+17 5,890 + 1040 117 + 49 8,940 + 1640 
GM (10-3 km? s~?) 0.0 + 1.0 3.0+2.7 1.1+06 3.2+28 
Charon-like 0.018 + 0.008 2.8 + 0.5 0.06 + 0.03 4.2 + 0.8 
Bright KBO 0.04 + 0.02 6.2+1.1 0.12 + 0.05 94+17 
Median KBO 0.12 + 0.05 17+3 0.35 +014 26+5 

Dark KBO 0.26 + 0.11 39+7 0.78 + 0.32 60+ 11 


Angles are measured from the ascending node of the P-C orbital plane on the J2000 equator. The epoch is Universal Coordinate Time (utc) on 1 July 2011. Uncertainties are 1c. A is disk-integrated reflectivity; 
Rioo, R3g and Rog are radius estimates assuming a spherical shape and p, = 1, 0.38, and 0.06; Vjg9 is the ellipsoidal volume if py = 1. Estimates of GM = Gppy>/?Vi00 are shown for properties resembling those of 
Charon (density p = 1.65 gcm “3, Py = 0.38) and three types of KBOs: ‘bright’ (p = 0.5; py = 0.1), ‘median’ (p = 0.65; py = 0.06), and ‘dark’ (p = 0.8; py = 0.04). Boldface values are within 1¢ of the dynamical mass 


constraints®. 
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Figure 2 | Numerical integrations of the Styx-Nix-Hydra resonance. 
Resonant angle @ is plotted versus time from the current epoch, using 

three assumptions for GMj;: 0.0032 km? s-? (a), 0.0039 km? s > (b), and 
0.0046 km” 5? (c); these values are equivalent to the nominal mass, a 0.250 
increase, and a 0.50 increase’. GMy = 0.0044 km’? s ° throughout, equivalent 
to 0.50 above its nominal mass. The modest increase in My, is sufficient to force 
a transition of ® from circulation (Styx outside resonance) to libration (Styx 
locked in resonance). 


where ®, = J; — 34g + 24g = 180° and 2S;¢ = Sg. For comparison, 
@, librates by only ~0.03° (ref. 10). However, a similar resonant angle 
among the exoplanets of Gliese 876 librates about 0° by ~40° (ref. 11). 

Using the current ephemeris and nominal masses®, our numerical 
integrations indicate that ® circulates, meaning that the resonance is 
inactive (Fig. 2). However, libration occurs if we increase the masses of 
Nix and Hydra, My and My, upward by small amounts (Fig. 3). 
Between these two limits, ® varies erratically and seemingly chaotic- 
ally. Extension of Fig. 3 to higher masses reveals that libration is 
favoured but never guaranteed. By random chance, it would be 
unlikely to find Styx orbiting so close to a strong three-body res- 
onance, and our finding that ® ~ 180° increases the likelihood that 
this resonance is active. We therefore believe that My + My has been 
slightly underestimated. The net change need not be large (<10)°, 
and is also compatible with the upper limit on My + My required for 
the long-term orbital stability of Kerberos’. 

Extended Data Fig. 2 shows that Kerberos contributes to the chaos. 
To understand its role, we perform simulations in which Pluto and 
Charon have been merged into one central body, thereby isolating the 
effects of the other moons on ®. We perform integrations with My = 
0 and with Mx nominal, and then Fourier transform @(t) to detect the 
frequencies of the perturbations (Extended Data Fig. 3). When Mx is 
non-zero, the power spectrum shows strong harmonics of the three 
synodic periods Ssx, Snx and Sxy; this is because P(t) is a linear 
combination of A,(f), Z(t) and Ay(f), and Kerberos perturbs each 
moon during each passage. The harmonics of a second three-body 
resonance also appear: BD’ = 42/5 — 85Ay + 432, ~ 180°, that is, 
42Snx ~ 43Ssn. This was the second strongest resonance found in our 
search; at the orbit of Styx, the two resonances are separated by just 
4 km. This is reminiscent of the Uranus system, where chains of near- 
resonances drive the chaos in that system’*"*. 

These results will influence future models of Pluto system forma- 
tion. Charon was probably formed by a large impact into Pluto’, and 
the outer moons accreted from the leftover debris. If Charon had a 
large initial eccentricity, then its corotation resonances could lock 
material into the 1:3:4:5:6 relationship’®. As Charon’s eccentricity 
damped, the resonant strengths waned, but the moons were left with 
periods close to these integer ratios'’. This appealing model has 
numerous shortcomings, however'**°. The presence of a strong 
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Figure 3 | Mass-dependence of the Laplace-like resonance. The shade of 
each square indicates whether the associated pair of mass values produces 
circulation (black) or libration (white) during a 10,000-year integration. 

The moon masses Mj; and My are each allowed to vary from nominal to 
nominal + 1o (ref. 6). Mx is nominal. Shades of grey define transitional states: 
light grey if ® is primarily circulating; dark grey if ® is primarily librating; 
medium grey for intermediate states. The transition between black and white 
is not monotonic, suggesting a fractal boundary. 


32 46 
Hydra GM (10 km? s-?) 


Laplace-like resonance places a new constraint on formation models. 
Additionally, future models must account for the non-zero eccent- 
ricities and inclinations of the small satellites; for example, these 
might imply that the system was excited in the past by resonances 
that are no longer active”!”’. 

The resonance enforces a modified relationship between orbits: if 
Py/Pc = 4and Py/Pc = 6, then Ps/Pc = 36/11 = 3.27. Nevertheless, 
the other three near-integer ratios remain unlikely to have arisen by 
chance. Excluding Styx, the probability that three real numbers would 
all fall within 0.11 of integers is just 1%. 


Shapes, sizes and physical properties 

Mean disk-integrated photometry for each moon is listed in Table 1. 
To infer the sizes of these bodies, we also require their visual geometric 
albedos p,. Charon is a relatively bright, with p, ~ 0.38. Kuiper Belt 
objects (KBOs) exhibit a large range of albedos, but the smallest KBOs 
tend to be dark; py ~ 0.04-0.08 is common?*”®. 

The photometry is expected to vary with phase angle « and, if a 
body is elongated or has albedo markings, with rotational phase. 
Extended Data Fig. 4 shows the raw photometry for Nix and Hydra. 
In spite of the otherwise large variations, an opposition surge is appar- 
ent for « < 0.5°; this is often seen in phase curves and is indicative 
of surface roughness. After dividing out the phase function model, 
Fig. 4 shows our measurements versus orbital longitude relative to 
Earth’s viewpoint. The measurements of Nix show no obvious 
pattern, suggesting that it is not in synchronous rotation; this is dis- 
cussed further below. 

With unknown rotation states, we can only assess the light curves in 
a statistical sense. We proceeded with some simplifying assumptions. 
(1) Each moon is a uniform triaxial ellipsoid, with dimensions 
(41003 Di00> C100)» assuming py = 1. (2) Each measurement was taken 
at a randomly chosen, unknown rotational phase. (3) Each moon was 
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Figure 4 | Normalized light curves. Disk-integrated photometry and Io error 
bars for Nix (a) and Hydra (b) have been normalized to x = 1° and then plotted 
as a function of projected orbital longitude. Here 0° corresponds to inferior 
conjunction with Pluto as seen from Earth. Measurements are colour-coded by 
year: red for 2010, green for 2011, and blue for 2012. A tidally locked moon 
would systematically brighten at maximum elongation (90° and 270°). 


in fixed rotation about its short axis. (4) The pole orientation may have 
changed during the gap in coverage between years; this is consistent 
with Supplementary Video 1, in which the rotation poles are generally 
stable for months at a time. We therefore describe the orientation by 
three values of sub-Earth planetocentric latitude: $2910, 62011, and 2010. 
We used Bayesian analysis to solve for the six parameters that provide 
the best statistical description of the data; see the Methods section 
for details. 


Nix has an unusually large axial ratio of ~2:1 (Table 1), comparable 
to that of Saturn’s extremely elongated moon, Prometheus. Hydra is 
also elongated, but probably less so. Also, Nix’s year-by-year varia- 
tions (Fig. 4) are the result of a rotation pole apparently turning 
towards the line of sight; this explains both its brightening trend 
and also the decrease in its variations during 2010-2012 (Extended 
Data Fig. 5). Pluto’s sub-Earth latitude is 46°, so Hydra’s measured 
pole is nearly compatible with the system pole. Nix’s pole was ~20° 
misaligned in 2010 but may have reached alignment by 2012. 

Given the inferred volume and an assumed albedo and density, we 
can estimate GM, where M is the mass and G is the gravitation con- 
stant. We consider four assumptions about the moons’ physical prop- 
erties, and compare GM to the dynamical estimates® (Table 1). Nix 
and Hydra are probably bright, Charon-like objects; if they were 
darker, then GM would be too large to be compatible with upper 
limits on the masses’. 

Kerberos seems to be very different (Table 1). The dynamical infer- 
ence that its mass is about a third that of Nix and Hydra, yet that it 
reflects only ~5% as much sunlight, implies that it is very dark. This 
violates our expectation that the moons should be self-similar due 
to the ballistic exchange of regolith””. Such heterogeneity has one 
precedent in the Solar System: at Saturn, Aegaeon is very dark (py 
<0.15), unlike any other satellite interior to Titan, and even though it 
is embedded within the ice-rich G ring. The formation of such a 
heterogeneous satellite system is difficult to understand. 
Alternatively, the discrepancy would go away if the estimate of My 
is found to be high by ~20; this has a nominal likelihood of <1%. 
Further study is needed. 


Rotation states 


Nearly every moon in the Solar System rotates synchronously; the 
only confirmed exception is Hyperion, which is driven into chaotic 
rotation by a resonance with Titan’”°. Neptune’s highly eccentric 
moon Nereid may also rotate chaotically*', but observational support 
is lacking’**’. We have searched for rotation periods that are consist- 
ent with the light curves of Nix and Hydra (Fig. 4), but results have 
been negative (Extended Data Fig. 6). Although we can sometimes 
find a rotation period that fits a single year’s data (spanning 2-6 
months), no single rotation period is compatible with all three years 
of data. 

Dynamical simulations explain this peculiar result: a binary planet 
tends to drive its moons into chaotic rotation. This is illustrated in 
Fig. 5, showing the simulated rotation period and orientation of Nix 
versus time. The moon has a tendency to lock into near-synchronous 
rotation for brief periods, but these configurations do not persist. 
At other times, the moon rotates at a period entirely unrelated to its 
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Figure 5 | Numerical simulations of Nix’s rotation. a, The instantaneous 
rotation period is compared to the synchronous rate (dashed line). b, The 
orientation is described by the angle between Nix’s long axis and the direction 


48 | NATURE | VOL 522 | 4 JUNE 2015 


1,000 
Time (days) 


towards the barycentre. Nix librates about 0° or 180° for periods of time, but it 
jumps out of these states frequently. 
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orbit. Supplementary Video 1 provides further insights into the beha- 
viour; for example, it shows occasional pole flips, a phenomenon 
consistent with the observed changes in Nix’s orientation. 
Lyapunov times are estimated to be a few months, or just a few multi- 
ples of the moons’ orbital periods. The timescale of the chaos depends 
on initial conditions and on assumptions about the axial ratios of the 
moons. The torques acting on a less-elongated body such as Hydra are 
weaker, but nevertheless our integrations support chaos. 

According to integrations spanning a few centuries, a moon that 
begins in synchronous rotation will stay there, albeit with large libra- 
tions. It is therefore possible for synchronous rotation about Pluto and 
Charon to be stable. However, the large and regular torques of Pluto 
and Charon probably swamp the small effects of tidal dissipation 
within the moons, so they never have a pathway to synchronous lock. 

Both photometry and dynamical models support the hypothesis 
that Nix and Hydra are in chaotic rotation. The mechanism is similar 
to that driving Hyperion’s chaos, with Charon playing Titan’s role. 
However, Titan’s influence on Hyperion is magnified by a strong 
orbital resonance. For a binary such as Pluto—Charon, it appears to 
be a general result that non-spherical moons may rotate chaotically; 
no resonance is required. 


Future observations 


The New Horizons spacecraft will fly past Pluto on 14 July 2015. At 
that time, many of the questions raised by this paper will be addressed. 
Although Kerberos will not be well resolved (2-3 km per pixel), 
images will settle the question of whether it is darker than the other 
moons. The albedos and shapes of Nix (imaged at $0.5 km per pixel) 
and Hydra (at 1 km per pixel) will be very well determined. New 
Horizons will not obtain precise masses for the outer moons, but 
ongoing Earth-based astrometry and dynamical modelling will con- 
tinue to refine these numbers, while also providing new constraints on 
the Laplace-like resonance. Because this resonance has a predicted 
libration period of centuries, the dynamical models will confirm or 
refute it long before a complete libration or circulation period can be 
observed. 

Chaotic dynamics makes it less likely that we will find rings or 
additional moons of Pluto. Within the Styx-Hydra region, the only 
stable orbits are co-orbitals of the known moons. The region beyond 
Hydra appears to be the region in which it is most likely that we will 
find additional moons’’, although some orbits close to Pluto are also 
stable**. Independent of the new discoveries in store, we have already 
learned that Pluto hosts a rich and complex dynamical environment, 
seemingly out of proportion to its diminutive size. 

Online Content Methods, along with any additional Extended Data display items 


and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Data selection and processing. Our data set encompasses all available HST 
images of the Pluto system during 2006 and 2010-2012, plus Kerberos in 2005. 
We neglected HST observations from 2002, 2003, and 2007°”°, because they are of 
generally lower quality, rendering Kerberos and Styx undetectable. We empha- 
sized long exposures through broad-band filters, although brief exposures of 
Charon and Pluto provided geometric reference points. Supplementary Table 1 
lists the images and bodies measured. We analysed the calibrated (‘flt’) image files. 
To detect Kerberos and Styx, it was often necessary to align and co-add multiple 
images from the same visit; files produced in this manner are listed in the table 
with a ‘coadd’ suffix. 

We fitted a model point spread function (PSF) to each detectable body. The 

PSFs were generated using the “Tiny Tim’ software maintained by the Space 
Telescope Science Institute (STScI)’*"°. Upon fitting to the image, the centre of 
the PSF provides the astrometry and the integrated volume under the two-dimen- 
sional curve, minus any background offset, is proportional to the disk-integrated 
photometry. We measured objects in order of decreasing brightness and sub- 
tracted each PSF before proceeding; this reduced the effects of glare on fainter 
objects. Measurements with implausible photometry were rejected; this was gen- 
erally the result of nearby background stars, cosmic ray hits, or other image flaws. 
Further details of the analysis are provided elsewhere’. Styx photometry (Table 1) 
might be biased slightly upward by our exclusion of non-detections; however, 
photometry of the other moons is very robust. 
The Pluto-Charon gravity field. We have simplified the central gravity field by 
taking its time-average. The resulting cylindrically symmetric gravity field can 
then be expressed using the same expansion in spherical harmonics that is tra- 
ditionally employed to describe the field of an oblate planet: 


V(r, 0,) = —GM/r}1— = Jimn(R/T)"" Pin(sin #) (1) 


Here (r, 0, b) are polar coordinates, where r is radius and 0 and ¢ are longitude 
and latitude angles, respectively; G is the gravitation constant, M is the body’s 
mass, R is its equatorial radius, P,,, is the mth Legendre polynomial, J,,, is the mth 
coefficient in the expansion. The dependence on 0 and the odd m-terms in the 
series vanish by symmetry. The coefficients J,, can be determined by noting that 
the potential along the axis of the ring simplifies considerably: 


2 


V(r, $=n/2) = —GM/r[1+(R/r)] is 


This can then be compared to the definition of Legendre polynomials: 


(1—2xt+2) 7"? = Yo PP n(x) (3) 


Substituting t = (R/r) and evaluating the expression for x = 0 yields: 


es) 


V(r,¢=1/2) =—GM/r 9° (R/r)"Pm(0) (4) 


m=0 


Noting that P,,(1) = 1 for all m, equations (1) and (4) can only be equal if the 
coefficients J,,, are negatives of the Legendre polynomials evaluated at zero: J, = 
1/2, J, = —3/8, Js = 5/16, and so on. Given this sequence of coefficients, we can 


determine n, w and Q as functions of semimajor axis a: 


n°(a)=GM/a° h- > (+m n(B/9)"Pe(0)] (5a) 
x(a) come x mR) "Pa (5b) 


v’(a)=GM/a? h- s (4 mF 8) "Fa O) (5c) 


3 
Il 
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Here x is the epicyclic frequency and ? is the vertical oscillation frequency. It 
follows that (a) =n(a) — x(a) and Q(a) =n(a) —v(a). In practice, we treated n 
as the independent variable because it has the strictest observational constraints, 
and then derived a, a and Q from it. 

Orbit fitting. We modelled each orbit as a Keplerian ellipse in the P-C frame, but 
with additional terms to allow for apsidal precession and nodal regression. Our 
model is accurate to first order in e and i; any second-order effects can be 
neglected because they would be minuscule compared to the precision of our 
measurements. 


We also required an estimate for the location of the system barycentre in each set 
of images. Because HST tracking is extremely precise between consecutive images, 
the barycentre location was only calculated once per HST orbit. We solved for the 
barycentre locations first and then held them fixed for subsequent modelling of the 
orbital elements. Barycentre locations were derived from the astrometry of Pluto, 
Charon, Nix and Hydra. We locked Pluto and Charon to the latest ephemeris 
distributed by the Jet Propulsion Laboratory, PLU043°. We accounted for the offset 
between the centre of light and centre of body for Pluto using the latest albedo map”’. 
However, because the number of Pluto and Charon measurements is limited, we 
also allowed Nix and Hydra to contribute to the solution. For each single year 2006- 
2012, we solved simultaneously for the barycentre location in each image set and also 
for orbital elements of Nix and Hydra. For the detection of Kerberos in 2005, the 
only available pointing reference was Hydra, which we derived from PLU043. By 
allowing many measurements to contribute to our barycentre determinations, we 
could improve their quality but also limit any bias introduced by shortcomings of 
our orbit models. The derived uncertainties in the barycentre locations are much 
smaller than any remaining sources of error. 

A nonlinear least-squares fitter identified the best value for each orbital ele- 
ment and also the covariance matrix, from which uncertainties could be derived. 
However, as noted in Table 1 and Extended Data Table 1, our RMS residuals 
(equivalent to the square root of x” per degree of freedom) exceed unity. For Styx 
and Kerberos, marginal detections probably contributed to the excess; for Nix and 
Hydra, we have identified the source as the un-modelled wobbles in the orbits. All 
uncertainty estimates have been scaled upward to accommodate these under- 
estimates. 

During the orbit fits, we rejected individual points with excessive residuals, 
based on the assumption that they were misidentifications or the results of poor 
PSF fits. Extended Data Table 1 lists values for the number of included (M,) and 
rejected (Mo) measurements. Rejecting points, however, would bias our uncer- 
tainty estimates downward. We compensated by running Monte Carlo simula- 
tions in which we generated (My + M,) Gaussian distributed, two-dimensional 
random variables and then rejected the Mp that fall furthest from the origin. The 
standard deviation among the remainder then gave us an estimate of the factor by 
which we might have inadvertently reduced our error bars. With this procedure, 
accidentally rejecting a small number of valid measurements would not bias the 
uncertainties. 

We also explored the implications of making various assumptions about how 

the orbital elements are coupled (Extended Data Table 1). For the purposes of this 
paper, we have adopted the N = 8 solutions in which Q can be derived from n and 
w. This assumption is helpful because, when e and i are small, the frequencies a 
and Q are especially difficult to measure. By allowing them to be coupled, we 
obtained more robust results. Nevertheless, our expectation that and Q should 
be roughly equal in magnitude but opposite in sign has been well supported by 
most of our uncoupled, N = 9 fits. 
Resonance analysis. We have defined a general resonance using a set of integer 
coefficients (p;, qj, r;). The strength of a resonance is equal to C(p, q ppMe;#! 
Isin'"'(), where 1; is the mass ratio of moon j to the mass of Pluto. The first 
product IT; excludes the mass of the smallest moon involved, because a resonance 
can exist even if one moon is a massless test particle. The function C defines a 
strength factor, but because it has no simple expression, we ignore it in this analysis 
except to note, qualitatively, that the strongest resonances tend to involve small 
coefficients and/or small differences between coefficients. 

We performed an exhaustive search for all possible resonances involving up to 
four non-zero coefficients, with |p| = 300, |q| = 4, and |r| = 4. Symmetry 
dictates that the coefficients sum to zero and that )> 1; must be even**. Because 


Charon follows a circular, equatorial orbit, qc =" I = 0. We first identified 


possible resonances by © <0.1° per day, and then followed up by evaluating ® 
for each year. Sets of coefficients for which ® values clustered near 0° or 180° were 
given preference. We also favoured sets of coefficients that have simple physical 
interpretations, and where the absolute values were small and/or close to one 
another. 

Orbital integrations. Our orbit simulations employed the numeric integrator 
SWIFT”. We used PLU043° as our reference ephemeris; it provides state vec- 
tors (positions and velocities) for all the bodies in the system versus time. For 
simplicity, we neglected bodies outside the Pluto system in most integrations. The 
Sun is the dominant external perturber, shifting the moons by a few tens of 
kilometres, primarily in longitude, after one Pluto orbit of 248 years; this is 
<1% of our orbital uncertainties. 

Each integration must begin with initial state vectors and masses for each body. 
However, the state vectors and masses are closely coupled; any change to one 
mass requires that we adjust all of the state vectors in order to match the observed 
orbits. Ideally, this would be accomplished by re-fitting to all of the available 
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astrometry, but that task is beyond the scope of this paper. To simplify the 
problem, we generated false astrometry derived directly from PLU043, but 
sampled at the times of all prior HST visits that detected one or more of the four 
outer moons. Such measurements date back to 11 June 2002°*°. For each set of 
assumed masses, we used a nonlinear least-squares fitter to solve for the initial 
state vectors that optimally matched this astrometry. A similar technique was 
used to model the effects of moon masses on the chaotic dynamics of the Uranus 
system’. This procedure guarantees that our numeric integrations will match the 
actual astrometry with reasonable accuracy, regardless of the masses assumed. 

For a few numerical experiments, we investigated the consequences of placing 
Styx exactly into its Laplace-like resonance (Extended Data Figs 2 and 3). We 
accomplished this by generating a different set of false astrometry, in which the 
position of Styx was derived from the requirement that ® = 180° at all times. 
Photometry. Our numerical simulations suggest that typical rotation periods for 
each moon are comparable to the orbital period, that is, several weeks. Because 
this timescale is long compared to one or a few of HST’s 95-min orbits, we 
combined measurements obtained from single or adjacent orbits. In 
Supplementary Table 1, adjacent orbits are indicated by an orbit number of 2 
or 3. Our photometry (Fig. 4 and Extended Data Fig. 4) is defined by the mean and 
standard deviation of all measurements from a single set of orbits. 

We considered two simple models for the light curves described as reflectivity 
A versus time t: 


Ai (t) =c +c sin (wt) +c cos (wt) 


(6a) 


A2(t) =co +c; sin (wt) +c cos (wt) +c3 sin (2wt) +c4cos(2mt) (6b) 


We then sought the frequency @ that minimizes residuals. Given the small num- 
ber of measurements in individual years, it was inappropriate to attempt more 
sophisticated models. Results are shown in Extended Data Fig. 6. For the data 
from 2010, we did identify frequencies where the residuals are especially small, 
suggesting that we may have identified a rotation rate for that subset of the data. 
However, in no case does a frequency persist from 2010 to 2012. 

Shape modelling. We have described the axial orientation relative to the line of 
sight using sub-Earth planetocentric latitude ¢. The hypothetical light curve 
of an ellipsoid is roughly sinusoidal; its projected cross-section on the sky 
varies between extremes Ayin aNd Amax- If @ = 0, Amin = Thc and Amax = Mac. 
If @ = 90°, then Amin = Amax = Mab. More generally 


Amin (4,b,c,0) = mb(c? cos’ @ +a’ sin” ¢) M2 (7a) 


Amax(a,b,c,p) = na(c cos” $ +b’ sin” ¢) V2 (7b) 


If ¢ is fixed and each measurement was obtained at a uniformly distributed, 
random rotational phase, then the conditional probability density function for 
a cross-section A given Amin and Aynax is: 


-1/2 

P(AlAmins Amax) % (1—[(A—Ap)/AA]?*) (8) 
where Ap = (Amax + Amin)/2 and AA = (Amax — Amin)/2. In reality, each 
measurement A has an associated uncertainty o. This has the effect of convolving 
P with a normal distribution N(A, co), with zero mean and standard deviation o. 


Plo|(AlAmins Amax.7) ¢(1—[(A—Ao)/AA) "@N(A,o) (9) 


where the ® operator represents convolution. 

However, simulations show that ¢ varies due to chaotic rotation driven by the 
central binary (Supplementary Video 1). To simplify this analysis, we have 
assumed that was fixed for the whole of each year during which we obtained 
data, but that changes may have occurred between years; this is generally con- 
sistent with the time spans of our data sets (a few months per year) and the 
infrequency of large pole changes in the simulations. This leads us to define three 
unknowns: $3010, #2011) and 2912. Because Amin and Amax depend only on sin? 
and cos*#, we replace the unknowns ¢ by S = sin’¢ in our analysis. 

We have a vector of independent measurements A = (Ao, Aj, ....) and uncer- 
tainties o = (00, 0},....), So the joint, conditional probability of obtaining all our 
measurements is a product: 


P(Ala, b, c, S2010, S201 S2012) 
(10) 
= TP[ox] (Ax|Amin (a, , ¢, Syear(k)) > Amax (4, B, €, Syear(k) )) 
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where year(k) is the year associated with measurement k. Instead, we seek the 
joint, conditional probability density function P(a, b, c, $2010; S2011 S2012| A). This 
is a problem in Bayesian analysis: 


P(a, b, ¢, S2o10, S201, S2012|.4) 


(11) 
= P(Ala, b,c, Sxo10, S201, $2012) P(A) / P(a, b,c, $2010, S201, $2012) 


Here P(A) and P(a, b, c, S2910: 2011. S2012) represent our assumed ‘prior prob- 
ability’ distributions for these quantities. We have no prior information about our 
measurements A;, so we assume that they are uniformly distributed. The second 
prior can be broken down as 


P(a, b, ¢, S2010, S201, $2012) =P(a, b, c) P(S2010) P(S2011)P(S2012) (12) 


because orientations are independent of shape and of one another. If the pole 
in each year is randomly distributed over 47 steradians, then P(p) x cos# and 
P(S) x S7*?, 

We model our prior for the shape as P(a, b, c) = P,(u)P2(v)P3(w), where u = 
abc; v= a/b; and w = b/c. This states that we will regard the ellipsoid’s volume and 
its two axial ratios as statistically independent. We have assumed that log(u) is 
uniformly distributed rather than u itself, which implies P,(u) « 1/u. Experience 
with other irregularly shaped planetary objects suggests that large ratios a/b and 
b/c are disfavoured, with values rarely exceeding 2. After some experimentation, 
we adopted P2(v) x 1/ v and P3(w) « 1/w’. Alternative but similar assumptions 
had little effect on our results. 

The above equations provide a complete solution to the joint probability func- 

tion P(a, b, c, S210, $2011; $2012). We solved for the complete six-dimensional 
function, represented as a six-dimensional array. Quantities listed in Table 1 were 
derived as the mean and standard deviation of P along each of its six axes, with $ 
converted back to ¢. Extended Data Fig. 5 compares the distribution of measure- 
ments by year with the reconstructed probability distributions. 
Simulations of rigid body rotation. The orientation of the ellipsoid can be defined 
by a unit quaternion: q = [cos(0/2), sin(0/2)u] represents a rotation by angle 0 
about unit axis vector u. The time-derivative dq/dt = [0, w]-q/2, where @ is the 
spin vector. We used a Bulirsch-Stoer integrator to track q, dq/dt, x and dx/dt, 
where x is the position of the ellipsoid relative to the barycentre. The 
forces and torques acting were defined by Pluto and Charon following 
fixed circular paths around the barycentre; this motion was pre-defined for 
the simulations, not integrated numerically. We derived d*x/d¢* from the 
gravity force of each body on the ellipsoid. We also required the 
second derivative of g: d’q/dt? = [—||7/2, a]-q/2, where @ is the time- 
derivative of w. We related @ to the torque applied by Pluto and Charon 
on the ellipsoid 


(13) 
where r = x — Xx; is the vector offset from each body centre to the ellipsoid’s 
centre and I is the ellipsoid’s moment of inertia tensor. In the internal 
frame of the ellipsoid, the moment of inertia tensor Ig is diagonal, with 
Ty = (M/5)(b? + 7), bo = (M/5)(a? + c’), and I33 = (M/5)(a? + 0b’). It 
is rotated to the system coordinate frame via the rotation matrix R, which can 
be calculated from gq: I = RIpR™. We then solve for @ via the relation t = Ia 
+o xX Ilo. 

Code availability. Portions of our software are available at https://github.com/ 
seti/pds-tools. We have opted not to release the entire source code because it is 
built on top of additional large libraries representing decades of development. 
Instead, we have documented our algorithms with sufficient detail to enable 
others to reproduce our results. 


t=3GMprp X (Irp)/|rp|’ +3GMerc x (Irc)/|rel? 
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Extended Data Figure 1 | Variations in orbital elements by year. Changes are +10. Each individual point is a fit to a single year of data (compare with 
in mean motion (a), eccentricity (b) and inclination (c) are shown during Extended Data Table 1). In a, An is the mean motion of each body minus its 
2010-2012 for Nix (red), Kerberos (green) and Hydra (blue). Vertical bars average during 2006-2012. 
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Extended Data Figure 2 | The role of Kerberos in the Laplace-like diagrams are for My nominal (a), Mx reduced by 1a (b) and My = 0 (c). The 
resonance. We have initiated an integration with Styx exactly in its resonance —_ amplitude of the libration is stable when Kerberos is massless, but shows erratic 
with Nix and Hydra, and then have allowed it to evolve for 10,000 years. The _ variations otherwise. 
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Extended Data Figure 3 | Spectral signatures of Kerberos. We merge Pluto _ create a signature at the synodic period and its overtones: Ssx = 53.98 days 
and Charon into a single central body and integrate ®(t) for Styx in exact (green); Syx = 109.24 days (red); Sky = 203.92 days (blue). b, Harmonics of 
resonance. The fast Fourier transform (FFT) power spectrum for My = 0 (light the second resonance, with period 42Syx ~ 43Ssn ~ 4,590 days, are also visible. 
grey) obscures the same spectrum obtained when Mx is nominal. Unobscured The 3/2 harmonic is unexplained. 

spikes are caused by Kerberos. a, The impulses of Kerberos passing each moon 
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Extended Data Figure 4 | Satellite phase curves. Raw disk-integrated 

photometry has been plotted versus phase angle « for Nix (a) and Hydra 

(b). Vertical bars are +1. An opposition surge is apparent. A simple 

parametric model for the phase curve is shown: c(1 + d/x), where dis fixed but c 


is scaled to fit each moon during each year. Measurements and curves are 
colour-coded by year: red for 2010, green for 2011, and blue for 2012. 
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year. The black curves show the theoretical probability density function (PDF) _ statistics, the measurements appear to be well described by the models, which 
of A by year for Nix (a, 2010; b, 2011; c, 2012) and Hydra (d, 2010; e, 2011; have been derived via Bayesian analysis. 
f, 2012), after convolution with the measurement uncertainties. The histogram 
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Extended Data Figure 6 | Searches for rotation periods in the light curves. | minima with RMS residuals <1 indicate a plausible fit. The orbital periods and 
We fitted a simple model involving a frequency and its first harmonic to the _half-periods are identified; if either moon were in synchronous rotation, we 


photometry (see equation (6)) of Nix (a) and Hydra (b). Curves are plotted for would expect to see minima near either P (for albedo variations) or P/2 (for 
data from 2010 (red), 2012 (blue) and for three years 2010-2012 (black). Local _irregular shapes). 
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Extended Data Table 1 | Orbital elements based on coupling various orbital elements and based on subsets of the data. 


Moon” * years Num = (day) (109) )—(iday) 0) 0) (léay) (days) «PP ta) (mas) (xm) MM 

Styx 2006-2012 9 42,662 276.8627 17.855814 5.892 296.15 0.49961 0.819 182.64 -0.50097 20.16150 3.156534 1.45 17.9 397 47 19 
= 81 0.0983 0.000255 1.179 9.49 0.02288 0.164 12.60 0.02376 0.00029 0.000050 

Styx 2006-2012 8 42,656 276.8562 17.855770 5.787 296.05 0.50581 0.809 183.36 -0.49187 20.16155 3.156542 1.44 17.8 397 47 19 
rq 78 0.0955 0.000235 1.144 9.40 0.01405 0.162 12.50 0.00027 0.000046 

Styx 2006-2012 zi 42,484 276.5918 17.855355 1.162 347.19 0.37688 0.381 176.10 -0.36908 20.16202 3.156615 1.79 19.3 429 47 19 
+ 82 0.1046 0.000286 __1.117 64.15 0.242 27.27 0.00032 0.000052 

Styx 2006-2012 6 42,422 276.5781 17.855333 1.054 7.70 0.37688 0.302 169.14 -0.36908 20.16204 3.156619 1.79 19.1 426 47 19 
rf 0.1031 0.000285 1.102 65.26 0.215 33.76 0.00032 0.000052 

Styx 2010 7 43,549 239.9346 17.840396 7.733 138.44 0.37600 2.502 0.88 -0.36824 20.17892 3.159262 1.29 12.9 288 7 4 
rf 617 0.2837 0.005338 2.225 16.61 0.894 13.64 0.00604 0.00604 0.000871 

Styx 2011 3 42,383 277.0515 17.807823 3.165041 1.10 16.1 363 12 1 
Ff 0.1879 0.018868 0.003818 

Styx 2012 7 42,856 332.3448 17.868293 6.915 116.85 0.37763 1.215 11.66 -0.36982 20.14742 3.154330 1.46 19.1 432 26 13 
+ 117 0.2178 0.014056 1.712 11.95 0.297 13.73 0.01585 0.01585 0.003199 

Nix 2006-2012 9 48,697 63.8733 14.484221 2.022 220.27 0.19074 0.139 358.77 -0.15203 24.85463 3.891303 2.58 4.22 94 831 27 
+ 3 0.0059 0.000015 0.050 1.41 0.00436 0.008 3.41 0.00842 0.00003 0.000004 

Nix 2006-2012 8 48,694 63.8655 14.484222 2.036 221.64 0.18325 0.133 3.73 -0.18096 24.85463 3.891302 2.59 4.25 95 831 27 
rf 3 0.0056 0.000015 0.050 1.40 0.00409 0.008 3.40 0.00003 0.000004 

Nix 2006-2012 7 48,696 63.8580 14.484244 2.022 213.64 0.21395 0.133 15.03 -0.21084 24.85459 3.891296 2.65 4.35 97 831 27 
rd 3 0.0054 0.000016 0.043 1.15 0.008 3.21 0.00003 0.00003 0.000004 

Nix 2006-2012 6 48,693 63.8573 14.484240 2.030 213.40 0.21395 0.132 13.88 -0.21084 24.85460 3.891297 2.65 4.35 97 831 27 
rq 0.0054 0.000016 0.041 1.13 0.008 2.89 0.00003 0.000050 

Nix 2010 7 48,670 177.1349 14.483409 3.297 146.01 0.21392 0.070 319.59 -0.21080 24.85603 3.891521 2.17 4.41 99 85 2 
+ 10 0.0158 0.000385 0.143 2.37 0.024 20.48 0.00066 0.00066 0.000100 

Nix 2011 7 48,670 63.8130 14.484954 1.598 229.57 0.21398 0.107 352.81 -0.21086 24.85338 3.891106 2.75 5.22 17° «124 «1 
+ 8 0.0187 0.000346 0.123 4.32 0.023 13.89 0.00059 0.00059 0.000094 

Nix 2012 7 48,704 325.0999 14.481191 2.068 292.10 0.21383 0.163 302.70 -0.21072 24.85983 3.892117 2.35 3.71 84 613 14 
= 4 0.0067 0.000382 0.060 1.27 0.011 3.51 0.00066 0.00066 0.000101 

Kerberos 2005-2012 9 57,832 94.3375 11.191287 3.471 186.59 0.12121 0.356 241.86 -0.20985 32.16788 5.036283 1.27 11.2 248 185 32 
+ 20 0.0206 0.000063 0.209 3.58 0.00795 0.037 5.48 0.01302 0.00018 0.000030 

Kerberos 2005-2012 8 57,783 94.3078 11.191398 3.280 187.64 0.11536 0.389 225.15 -0.11419 32.16756 5.036233 1.26 11.2 249 185 32 
+ 19 0.0211 0.000050 0.200 3.74 0.00615 0.037 5.43 0.00014 0.000024 

Kerberos 2005-2012 7 57,781 94.3074 11.191394 3.272 187.28 0.10957 0.385 225.17 -0.10851 32.16757 5.036234 1.27 11.3 251 185 32 
+ 19 0.0214 0.000050 0.203 3.75 0.037 5.54 0.00014 0.00014 0.000024 

Kerberos 2005-2012 6 57,750 94.3085 11.191397 3.221 187.86 0.10957 0.411 226.88 -0.10851 32.16756 5.036233 1.27 11.2 249 185 32 
rd 0.0213 0.000050 _ 0.199 3.79 0.035 4.86 0.00014 0.000024 

Kerberos 2010 7 57,825 329.5189 11.189590 4.877 140.09 0.10953 0.284 298.05 -0.10846 32.17276 5.037046 1.24 8.78 196 30 10 
FA 48 0.0542 0.001181 0.481 5.69 0.090 17.99 0.00340 0.00340 0.000561 

Kerberos 2011 7 57,776 94.1883 11.194672 1.890 216.87 0.10965 0.515 250.03 -0.10859 32.15815 5.034760 1.19 6.99 157 30 1 
F 40 0.0680 0.001446 0.497 14.19 0.084 8.23 0.00415 0.00415 0.000652 

Kerberos 2012 7 57,803 230.3510 11.190758 3.335 233.57 0.10955 0.434 172.29 -0.10849 32.16940 5.036521 1.24 12.28 278 #119 20 
+ 38 0.0418 0.004783 0.423 8.24 0.074 10.08 0.01375 0.01375 0.002352 

Hydra 2006-2012 9 64,741 197.8685 9.423633 5.837 192.40 0.06842 0.244 191.15 -0.08762 38.20183 5.980972 2.73 3.19 71 835 24 
+ 3 0.0032 0.000009 0.025 0.26 0.00081 0.005 1.19 0.00317 0.00003 0.000005 

Hydra 2006-2012 8 64,738 197.8662 9.423647 5.862 192.22 0.06986 0.242 189.67 -0.06934 38.20177 5.980963 2.77 3.21 72 835 24 
rd 3 0.0032 0.000008 0.025 0.27 0.00080 0.005 1.17 0.00003 0.000005 

Hydra 2006-2012 7 64,738 197.8664 9.423645 5.861 192.04 0.07101 0.242 189.91 -0.07048 38.20178 5.980965 2.77 3.22 72 835 24 
+ 3 0.0032 0.000008 _0.025 0.24 0.005 1.15 0.00003 0.00003 0.000005 

Hydra 2006-2012 6 64,721 197.8691 9.423638 5.881 192.04 0.07101 0.249 193.12 -0.07048 38.20181 5.980969 2.80 3.25 72 835 24 
+ 0.0032 0.000008 _0.025 0.24 0.005 0.99 0.00003 0.000005 

Hydra 2010 7 64,730 358.2681 9.423299 6.661 165.04 0.07101 0.334 219.90 -0.07048 38.20318 5.981184 2.77 3.07 69 85 2 
* 8 0.0079 0.000199 0.080 0.63 0.013 2.25 0.00081 0.00081 0.000124 

Hydra 2011 7 64,746 197.8686 9.423495 5.722 192.77 0.07101 0.242 193.93 -0.07048 38.20239 5.981060 2.88 3.24 73 1385 12 
* 12 0.0166 0.000157 (0.271 2.13 0.030 4.77 _ 0.00064 0.00064 0.000088 

Hydra 2012 7 64,739 46.9262 9.422786 5.763 218.15 0.07100 0.214 157.81 -0.07047 38.20526 5.981510 2.35 2.93 66 606 10 
Fa 3 0.0033 0.000592 0.050 0.43 0.006 1.91 0.00240 0.00240 0.000379 


Columns Mj, and Mp identify the numbers of measurements included in and excluded from the fit; N indicates the number of free parameters. When N = 8, we derived @ from the relationship v? = 2n? — x. For N= 
7, @ and Q were both derived from n and the gravity field using equations (5b) and (5c). For N = 6, a was also coupled to n via equation (5a). N = 3 indicates a fit to a circular orbit. For fits to single years of data, the 
epoch is 1 July utc for that year. We disfavour N = 7 in the multi-year fits because some residuals increase markedly. 
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Observation of the rare BJ‘ n~ decay from the 
combined analysis of CMS and LHCb data 


The CMS and LHCb collaborations* 


The standard model of particle physics describes the fundamental 
particles and their interactions via the strong, electromagnetic and 
weak forces. It provides precise predictions for measurable quanti- 
ties that can be tested experimentally. The probabilities, or branch- 
ing fractions, of the strange B meson (B°) and the B° meson decaying 
into two oppositely charged muons (#* and s_) are especially inter- 
esting because of their sensitivity to theories that extend the standard 
model. The standard model predicts that the B?> ua and 
B®" w” decays are very rare, with about four of the former occur- 
ring for every billion B° mesons produced, and one of the latter 
occurring for every ten billion B’ mesons’. A difference in the 
observed branching fractions with respect to the predictions of the 
standard model would provide a direction in which the standard 
model should be extended. Before the Large Hadron Collider (LHC) 
at CERN?’ started operating, no evidence for either decay mode had 
been found. Upper limits on the branching fractions were an order 
of magnitude above the standard model predictions. The CMS 
(Compact Muon Solenoid) and LHCb (Large Hadron Collider beauty) 
collaborations have performed a joint analysis of the data from 
proton-proton collisions that they collected in 2011 at a centre-of- 
mass energy of seven teraelectronvolts and in 2012 at eight teraelec- 
tronvolts. Here we report the first observation of the B°— Ma 
decay, with a statistical significance exceeding six standard deviations, 
and the best measurement so far of its branching fraction. 
Furthermore, we obtained evidence for the B°> wz” decay with 
a statistical significance of three standard deviations. Both mea- 
surements are statistically compatible with standard model predic- 
tions and allow stringent constraints to be placed on theories beyond 
the standard model. The LHC experiments will resume taking data in 
2015, recording proton-proton collisions at a centre-of-mass energy 
of 13 teraelectronvolts, which will approximately double the produc- 
tion rates of B° and B° mesons and lead to further improvements in 
the precision of these crucial tests of the standard model. 

Experimental particle physicists have been testing the predictions of 
the standard model of particle physics (SM) with increasing precision 
since the 1970s. Theoretical developments have kept pace by improving 
the accuracy of the SM predictions as the experimental results gained in 
precision. In the course of the past few decades, the SM has passed 
critical tests derived from experiment, but it does not address some 
profound questions about the nature of the Universe. For example, the 
existence of dark matter, which has been confirmed by cosmological 
data’, is not accommodated by the SM. It also fails to explain the origin 
of the asymmetry between matter and antimatter, which after the Big 
Bang led to the survival of the tiny amount of matter currently present 
in the Universe**. Many theories have been proposed to modify the SM 
to provide solutions to these open questions. 

The B? and B® mesons are unstable particles that decay via the weak 
interaction. The measurement of the branching fractions of the very 
rare decays of these mesons into a dimuon (ji* 1” ) final state is espe- 
cially interesting. 

At the elementary level, the weak force is composed of a ‘charged 
current’ and a ‘neutral current’ mediated by the W- and Z bosons, 


respectively. An example of the charged current is the decay of the x* 
meson, which consists of an up (u) quark of electrical charge +2/3 of 
the charge of the proton and a down (d) antiquark of charge + 1/3. A 
pictorial representation of this process, known as a Feynman diagram, 
is shown in Fig. la. The u and d quarks are ‘first generation’ or lowest 
mass quarks. Whenever a decay mode is specified in this Letter, the 
charge conjugate mode is implied. 

The B* meson is similar to the 2*, except that the light d antiquark 
is replaced by the heavy ‘third generation’ (highest mass quarks) 
beauty (b) antiquark, which has a charge of +1/3 and a mass of 
~5 GeV/c’ (about five times the mass of a proton). The decay 
B*— yy, represented in Fig. 1b, is allowed but highly suppressed 
because of angular momentum considerations (helicity suppression) 
and because it involves transitions between quarks of different genera- 
tions (CKM suppression), specifically the third and first generations of 
quarks. All b hadrons, including the B an Be and B° mesons, decay 
predominantly via the transition of the b antiquark to a ‘second gen- 
eration’ (intermediate mass quarks) charm (c) antiquark, which is less 
CKM suppressed, into final states with charmed hadrons. Many 
allowed decay modes, which typically involve charmed hadrons and 
other particles, have angular momentum configurations that are not 
helicity suppressed. 

The neutral B° meson is similar to the B* except that the u quark is 
replaced by a second generation strange (s) quark of charge — 1/3. The 
decay of the B? meson to two muons, shown in Fig. lc, is forbidden at 
the elementary level because the Z° cannot couple directly to quarks of 
different flavours, that is, there are no direct ‘flavour changing neutral 
currents’. However, it is possible to respect this rule and still have this 
decay occur through ‘higher order’ transitions such as those shown in 
Fig. 1d and e. These are highly suppressed because each additional 
interaction vertex reduces their probability of occurring significantly. 
They are also helicity and CKM suppressed. Consequently, the 
branching fraction for the B?>y*p~ decay is expected to be very 
small compared to the dominant b antiquark to c antiquark transitions. 
The corresponding decay of the B° meson, where a d quark replaces the 
s quark, is even more CKM suppressed because it requires a jump 
across two quark generations rather than just one. 

The branching fractions, B, of these two decays, accounting for 
higher-order electromagnetic and strong interaction effects, and using 
lattice quantum chromodynamics to compute the B° and B° meson 
decay constants*”’, are reliably calculated’ in the SM. Their values are 
B(B) > nt LU )gyy = (3.66 £0.23) Xx 107? and = B(B° Sut oy = 
(1.06 £0.09) x 107 2°, 

Many theories that seek to go beyond the standard model (BSM) 
include new phenomena and particles*’, such as in the diagrams 
shown in Fig. 1fand g, that can considerably modify the SM branching 
fractions. In particular, theories with additional Higgs bosons'*"’ pre- 
dict possible enhancements to the branching fractions. A significant 
deviation of either of the two branching fraction measurements from 
the SM predictions would give insight on how the SM should be 
extended. Alternatively, a measurement compatible with the SM could 
provide strong constraints on BSM theories. 


*Lists of participants and their affiliations appear in the online version of the paper. 
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The ratio of the branching fractions of the two decay modes pro- 
vides powerful discrimination among BSM theories”. It is predicted in 
the SM (refs 1, 13 (updates available at http://itpwiki-unibe.ch/), 14, 
15 (updated results and plots available at http://www.slac.stanford. 
edu/xorg/hfag/)) to be R=B(B° Sut UW), /B(Bo ut Ue Joy = 
0.0295*9 93s, Notably, BSM theories with the property of minimal 
flavour violation’® predict the same value as the SM for this ratio. 

The first evidence for the decay B?—>* .~ was presented by the 
LHCb collaboration in 2012"”. Both CMS and LHCb later published 
results from all data collected in proton-proton collisions at centre-of- 
mass energies of 7 TeV in 2011 and 8 TeV in 2012. The measurements 
had comparable precision and were in good agreement'®””, although 
neither of the individual results had sufficient precision to constitute 
the first definitive observation of the B° decay to two muons. 

In this Letter, the two sets of data are combined and analysed 
simultaneously to exploit fully the statistical power of the data and 
to account for the main correlations between them. The data corre- 
spond to total integrated luminosities of 25 fb ' and 3 fb’ * for the 
CMS and LHCb experiments, respectively, equivalent to a total of 
approximately 10’? B° and B° mesons produced in the two experi- 
ments together. Assuming the branching fractions given by the SM 
and accounting for the detection efficiencies, the predicted numbers of 
decays to be observed in the two experiments together are about 100 
for Bo u* wand 10 for Bouwy. 

The CMS” and LHCb”' detectors are designed to measure SM phe- 
nomena with high precision and search for possible deviations. The two 
collaborations use different and complementary strategies. In addition to 
performing a broad range of precision tests of the SM and studying the 
newly-discovered Higgs boson*””’, CMS is designed to search for and 
study new particles with masses from about 100 GeV/c’ to a few TeV/c’. 
Since many of these new particles would be able to decay into b quarks 
and many of the SM measurements also involve b quarks, the detection of 
b-hadron decays was a key element in the design of CMS. The LHCb 
collaboration has optimized its detector to study matter-antimatter 
asymmetries and rare decays of particles containing b quarks, aiming 
to detect deviations from precise SM predictions that would indicate 
BSM effects. These different approaches, reflected in the design of the 
detectors, lead to instrumentation of complementary angular regions 
with respect to the LHC beams, to operation at different proton-proton 
collision rates, and to selection of b quark events with different efficiency 
(for experimental details, see Methods). In general, CMS operates at a 
higher instantaneous luminosity than LHCb but has a lower efficiency 
for reconstructing low-mass particles, resulting in a similar sensitivity to 
LHCb for B® or B? (denoted hereafter by BY, )) mesons decaying into two 
muons. 

Muons do not have strong nuclear interactions and are too mas- 
sive to emit a substantial fraction of their energy by electromagnetic 
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Figure 1 | Feynman diagrams related to the B°—>u* uw” decay. a, z* meson 
decay through the charged-current process; b, B* meson decay through the 
charged-current process; c, a B? decay through the direct flavour changing 
neutral current process, which is forbidden in the SM, as indicated by a large red 
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radiation. This gives them the unique ability to penetrate dense mate- 
rials, such as steel, and register signals in detectors embedded deep 
within them. Both experiments use this characteristic to identify 
muons. 

The experiments follow similar data analysis strategies. Decays 
compatible with B?,—>* u~ (candidate decays) are found by com- 
bining the reconstructed trajectories (tracks) of oppositely charged 
particles identified as muons. The separation between genuine 
By yt w- decays and random combinations of two muons (com- 
binatorial background), most often from semi-leptonic decays of two 
different b hadrons, is achieved using the dimuon invariant mass, 
M+ ,-» and the established characteristics of BP. -meson decays. For 
example, because of their lifetimes of about 1.5 ps and their production 
at the LHC with momenta between a few GeV/c and ~100 GeV/c, Bey 
mesons travel up to a few centimetres before they decay. Therefore, the 
Bey Sut “decay vertex’, from which the muons originate, is 
required to be displaced with respect to the ‘production vertex’, 
the point where the two protons collide. Furthermore, the negative 
of the Bry candidate’s momentum vector is required to point back to 
the production vertex. 

These criteria, amongst others that have some ability to distinguish 
known signal events from background events, are combined into 
boosted decision trees (BDTs)***°. A BDT is an ensemble of decision 
trees each placing different selection requirements on the individual 
variables to achieve the best discrimination between ‘signal-like’ and 
‘background-like’ events. Both experiments evaluated many variables 
for their discriminating power and each chose the best set of about ten 
to be used in its respective BDT. These include variables related to the 
quality of the reconstructed tracks of the muons; kinematic variables 
such as transverse momentum (with respect to the beam axis) of the 
individual muons and of the Bey candidate; variables related to the 
decay vertex topology and fit quality, such as candidate decay length; 
and isolation variables, which measure the activity in terms of other 
particles in the vicinity of the two muons or their displaced vertex. A 
BDT must be ‘trained’ on collections of known background and signal 
events to generate the selection requirements on the variables and the 
weights for each tree. In the case of CMS, the background events used 
in the training are taken from intervals of dimuon mass above and 
below the signal region in data, while simulated events are used for the 
signal. The data are divided into disjoint sub-samples and the BDT 
trained on one sub-sample is applied to a different sub-sample to avoid 
any bias. LHCb uses simulated events for background and signal in the 
training of its BDT. After training, the relevant BDT is applied to each 
event in the data, returning a single value for the event, with high 
values being more signal-like. To avoid possible biases, both experi- 
ments kept the small mass interval that includes both the B° and B° 
signals blind until all selection criteria were established. 
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‘X; d, e, higher-order flavour changing neutral current processes for the 
Bo-u* uw decay allowed in the SM; and fand g, examples of processes for the 
same decay in theories extending the SM, where new particles, denoted X° and 
X*, can alter the decay rate. 
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In addition to the combinatorial background, specific b-hadron 
decays, such as B° > x yu" v where the neutrino cannot be detected 
and the charged pion is misidentified as a muon, or Bos mu LL, 
where the neutral pion in the decay is not reconstructed, can mimic the 
dimuon decay of the Be. mesons. The invariant mass of the recon- 
structed dimuon candidate for these processes (semi-leptonic back- 
ground) is usually smaller than the mass of the B? or B° meson because 
the neutrino or another particle is not detected. There is also a back- 
ground component from hadronic two-body B?) decays (peaking 
background) as B°—> K* x, when both hadrons ‘from the decay are 
misidentified as muons. These misidentified decays can produce peaks 
in the dimuon invariant-mass spectrum near the expected signal, 
especially for the B°—> yu" yw decay. Particle identification algorithms 
are used to minimize the probability that pions and kaons are mis- 
identified as muons, and thus suppress these background sources. 
Excellent mass resolution is mandatory for distinguishing between 
B® and B° mesons with a mass difference of about 87 MeV/c” and 
for separating them from backgrounds. The mass resolution for 
B°— + uw decays in CMS ranges from 32 to 75 MeV/c’, depending 
on the direction of the muons relative to the beam axis, while LHCb 
achieves a uniform mass resolution of about 25 MeV/c’. 

The CMS and LHCb data are combined by fitting a common value for 
each branching fraction to the data from both experiments. The branch- 
ing fractions are determined from the observed numbers, efficiency- 
corrected, of B’) mesons that decay into two muons and the total 
numbers of BY ) panes produced. Both experiments derive the latter 
from the number of observed B* > JA K* decays, whose branching 
fraction has been precisely measured elsewhere’. Assuming equal rates 
for B* and B® production, this gives the normalization for B°—> py". 
To derive the number of B° mesons from this B* decay mode, the ratio 
of b quarks that form (hadronize into) B* mesons to those that form Bo 
mesons is also needed. Measurements of this ratio”””*, for which there is 
additional discussion in Methods, and of the branching fraction 
B(B* > Iw K*) are used to normalize both sets of data and are con- 
strained within Gaussian uncertainties in the fit. The use of these two 
results by both CMS and LHCb is the only significant source of correla- 
tion between their individual branching fraction measurements. The 
combined fit takes advantage of the larger data sample to increase the 
precision while properly accounting for the correlation. 


CMS and LHCb (LHC run 1) 


In the simultaneous fit to both the CMS and LHCb data, the branch- 
ing fractions of the two signal channels are common parameters of 
interest and are free to vary. Other parameters in the fit are considered 
as nuisance parameters. Those for which additional knowledge is 
available are constrained to be near their estimated values by using 
Gaussian penalties with their estimated uncertainties while the others 
are free to float in the fit. The ratio of the hadronization probability 
into B* and B? mesons and the branching fraction of the normaliza- 
tion channel B* — JA K* are common, constrained parameters. 
Candidate decays are categorized according to whether they were 
detected in CMS or LHCb and to the value of the relevant BDT dis- 
criminant. In the case of CMS, they are further categorized according 
to the data-taking period, and, because of the large variation in mass 
resolution with angle, whether the muons are both produced at large 
angles relative to the proton beams (central-region) or at least one 
muon is emitted at small angle relative to the beams (forward-region). 
An unbinned extended maximum likelihood fit to the dimuon invari- 
ant-mass distribution, in a region of about +500 MeV/c’ around the 
Be mass, is performed simultaneously in all categories (12 categories 
from CMS and eight from LHCb). Likelihood contours in the plane of 
the parameters of interest, B(B° > * u) versus B(Bo>u* u7), are 
obtained by constructing the test statistic —2AlnL from the difference 
in log-likelihood (InL) values between fits with fixed values for the 
parameters of interest and the nominal fit. For each of the two branch- 
ing fractions, a one-dimensional profile likelihood scan is likewise 
obtained by fixing only the single parameter of interest and allowing 
the other to vary during the fits. Additional fits are performed where 
the parameters under consideration are the ratio of the branching 


0 

fractions relative to their SM predictions, Seo =B(Bi, aut p)/ 
BB? 

The combined fit result is shown for all 20 categories in Extended 
Data Fig. 1. To represent the result of the fit in a single dimuon 
invariant-mass spectrum, the mass distributions of all categories, 
weighted according to values of S/(S + B), where S is the expected 
number of B? signals and B is the number of background events under 
the B? peak in that category, are added together and shown in Fig. 2. 
The result of the simultaneous fit is overlaid. An alternative repres- 
entation of the fit to the dimuon invariant-mass distribution for the six 
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Figure 2 | Weighted distribution of the dimuon invariant mass, m,,+,,-, for 
all categories. Superimposed on the data points in black are the combined fit 
(solid blue line) and its components: the Bo (yellow shaded area) and B° (light- 
blue shaded area) signal components; the combinatorial background (dash- 


dotted green line); the sum of the semi-leptonic backgrounds (dotted salmon 


70 | NATURE | VOL 522 | 4 JUNE 2015 


—e Data 

—— Signal and background 
1 Bo 3 ww 

(Bo 3 wt 

= = = Combinatorial background 


Semi-leptonic background 


— — Peaking background 


5,400 
(MeV/c?) 


5,600 5,800 


line); and the peaking backgrounds (dashed violet line). The horizontal bar on 
each histogram point denotes the size of the binning, while the vertical bar 
denotes the 68% confidence interval. See main text for details on the weighting 
procedure. 
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Figure 3 | Likelihood contours in the B(B° = nt n") versus (b) and B(B° > pw" w) (c). The dark and light (cyan) areas define the +1o and 


B(Bo nu" pn") plane. The (black) cross in a marks the best-fit central value. 
The SM expectation and its uncertainty is shown as the (red) marker. Each 
contour encloses a region approximately corresponding to the reported 
confidence level. b, c, Variations of the test statistic -2AlnL for B(B? > ut wu ) 


categories with the highest S/(S + B) value for CMS and LHCb, as well 
as displays of events with high probability to be genuine signal decays, 
are shown in Extended Data Figs 2-4. 

The combined fit leads to the measurements B(B?> pt p~)= 
(2.887) x 107° and B(B° > u* w-)= (3.971) x 1071, where the 
aihcertaintiee include both statistical and systematic sources, the latter 
contributing 35% and 18% of the total uncertainty for the B° and B° 
signals, respectively. Using Wilks’ theorem”, the statistical signifi- 
cance in unit of standard deviations, ¢, is computed to be 6.2 for the 
Bo—yu* w- decay mode and 3.2 for the B° > yw" mode. For each 
signal the null hypothesis that is used to compute the significance 
includes all background components predicted by the SM as well as 
the other signal, whose branching fraction is allowed to vary freely. The 
median expected significances assuming the SM branching fractions 
are 7.40 and 0.80 for the B? and B° modes, respectively. Likelihood 
contours for B(B° > * Ww”) versus B(Bo > 1+ p~ ) are shown in Fig. 3. 
One-dimensional likelihood scans for both decay modes are displayed 
in the same figure. In addition to the likelihood scan, the statistical 
significance and confidence intervals for the B° branching fraction are 
determined using simulated experiments. This determination yields a 
significance of 3.0¢ for a B® signal with respect to the same null hypo- 
thesis described above. Following the Feldman-Cousins” procedure, 


CMS and LHCb (LHC run 1) 
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Figure 4 | Variation of the test statistic —2AlnL as a function of the ratio of 
branching fractions R = B(B° 7 w”)/B( Bon" Ww). The dark and light 
(cyan) areas define the +1o and +2o confidence intervals for R, respectively. 
The value and uncertainty for predicted in the SM, which is the same in BSM 
theories with the minimal flavour violation (MFV) property, is denoted with 
the vertical (red) band. 


+2o confidence intervals for the branching fraction, respectively. The SM 
prediction and its uncertainty for each branching fraction is denoted with the 
vertical (red) band. 


+1o and +2c confidence intervals for B(B° > y* uw) of [2.5, 5.6] X 
10° '° and [1.4, 7.4] X 107 '° are obtained, respectively (see Extended 
Data Fig. 5). 

The fit for the ratios of the branching f fractions relative to their SM 
predictions yields Se =0.761)78 and &., =3.7*14. Associated like- 
lihood contours aad one-dimensional likelihood scans are shown in 
Extended Data Fig. 6. The measurements are compatible with the SM 
branching fractions of the B?> + p~ and B°—> uw" Ww decays at the 
1.20 and 2.20 level, respectively, when computed from the one- 
dimensional hypothesis tests. . Finally, the fit for the ratio of branching 
fractions yields R = 0.147308 0,06, Which is compatible with the SM at the 
2.3o level. The one-dimensional likelihood scan for this parameter is 
shown in Fig. 4. 

The combined analysis of data from CMS and LHCb, taking advant- 
age of their full statistical power, establishes conclusively the existence 
of the B°> + w~ decay and provides an improved measurement of its 
branching fraction. This concludes a search that started more than 
three decades ago (see Extended Data Fig. 7), and initiates a phase of 
precision measurements of the properties of this decay. It also pro- 
duces three standard deviation evidence for the B°—> :* uw decay. The 
measured branching fractions of both decays are compatible with SM 
predictions. This is the first time that the CMS and LHCb collabora- 
tions have performed a combined analysis of sets of their data in order 
to obtain a statistically significant observation. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Experimental setup. At the Large Hadron Collider (LHC), two counter-rotating 
beams of protons, contained and guided by superconducting magnets spaced 
around a 27 km circular tunnel, located approximately 100 m underground near 
Geneva, Switzerland, are brought into collision at four interaction points (IPs). 
The study presented in this Letter uses data collected at energies of 3.5 TeV per 
beam in 2011 and 4 TeV per beam in 2012 by the CMS and LHCb experiments 
located at two of these IPs. 

The CMS and LHCb detectors are both designed to look for phenomena beyond 
the SM (BSM), but using complementary strategies. The CMS detector”®, shown in 
Extended Data Fig. 3, is optimized to search for yet unknown heavy particles, with 
masses ranging from 100 GeV/c’ to a few TeV/c’, which, if observed, would be a 
direct manifestation of BSM phenomena. Since many of the hypothesized new 
particles can decay into particles containing b quarks or into muons, CMS is able to 
detect efficiently and study B® (5,280 MeV/c’) and B®? (5,367 MeV/c’) mesons 
decaying to two muons even though it is designed to search for particles with much 
larger masses. The CMS detector covers a very large range of angles and momenta 
to reconstruct high-mass states efficiently. To that extent, it employs a 13 m long, 6 
m diameter superconducting solenoid magnet, operated at a field of 3.8 T, centred 
on the IP with its axis along the beam direction and covering both hemispheres. A 
series of silicon tracking layers, consisting of silicon pixel detectors near the beam 
and silicon strips farther out, organized in concentric cylinders around the beam, 
extending to a radius of 1.1 m and terminated on each end by planar detectors 
(disks) perpendicular to the beam, measures the momentum, angles, and position 
of charged particles emerging from the collisions. Tracking coverage starts from 
the direction perpendicular to the beam and extends to within 220 mrad from it on 
both sides of the IP. The inner three cylinders and disks extending from 4.3 to 10.7 
cm in radius transverse to the beam are arrays of 100 X 150 tum” silicon pixels, 
which can distinguish the displacement of the b-hadron decays from the primary 
vertex of the collision. The silicon strips, covering radii from 25 cm to approxi- 
mately 110 cm, have pitches ranging from 80 to 183 tm. The impact parameter is 
measured with a precision of 10 jum for transverse momenta of 100 GeV/c and 20 
um for 10 GeV/c. The momentum resolution, provided mainly by the silicon 
strips, changes with the angle relative to the beam direction, resulting in a mass 
resolution for Bey utp decays that varies from 32 MeV/c’ for Be, mesons 
produced perpendicularly to the proton beams to 75 MeV/c’ for those produced at 
small angles relative to the beam direction. After the tracking system, at a greater 
distance from the IP, there is a calorimeter that stops (absorbs) all particles except 
muons and measures their energies. The calorimeter consists of an electromag- 
netic section followed by a hadronic section. Muons are identified by their ability 
to penetrate the calorimeter and the steel return yoke of the solenoid magnet and 
to produce signals in gas-ionization particle detectors located in compartments 
within the steel yoke. The CMS detector has no capability to discriminate between 
charged hadron species, pions, kaons, or protons, that is effective at the typical 
particle momenta in this analysis. 

The primary commitment of the LHCb collaboration is the study of particle- 
antiparticle asymmetries and of rare decays of particles containing b and c quarks. 
LHCb aims at detecting BSM particles indirectly by measuring their effect on 
b-hadron properties for which precise SM predictions exist. The production cross 
section of b hadrons at the LHC is particularly large at small angles relative to the 
colliding beams. The small-angle region also provides advantages for the detection 
and reconstruction of a wide range of their decays. The LHCb experiment”', shown 
in Extended Data Fig. 4, instruments the angular interval from 10 to 300 mrad with 
respect to the beam direction on one side of the interaction region. Its detectors are 
designed to reconstruct efficiently a wide range of b-hadron decays, resulting in 
charged pions and kaons, protons, muons, electrons, and photons in the final state. 
The detector includes a high-precision tracking system consisting of a silicon strip 
vertex detector, a large-area silicon strip detector located upstream of a dipole 
magnet characterized by a field integral of 4 T m, and three stations of silicon strip 
detectors and straw drift tubes downstream of the magnet. The vertex detector has 
sufficient spatial resolution to distinguish the slight displacement of the weakly 
decaying b hadron from the primary production vertex where the two protons 
collided and produced it. The tracking detectors upstream and downstream of the 
dipole magnet measure the momenta of charged particles. The combined tracking 
system provides a momentum measurement with an uncertainty that varies from 
0.4% at 5 GeV/c to 0.6% at 100 GeV/c. This results in an invariant-mass resolution 
of 25 MeV/c’ for Bey mesons decaying to two muons that is nearly independent of 
the angle with respect to the beam. The impact parameter resolution is smaller 
than 20 1m for particle tracks with large transverse momentum. Different types of 
charged hadrons are distinguished by information from two ring-imaging 
Cherenkov detectors. Photon, electron, and hadron candidates are identified by 
calorimeters. Muons are identified by a system composed of alternating layers of 
iron and multiwire proportional chambers. 
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Neither CMS nor LHCb records all the interactions occurring at its IP 
because the data storage and analysis costs would be prohibitive. Since most 
of the interactions are reasonably well characterized (and can be further 
studied by recording only a small sample of them) specific event filters (known 
as triggers) select the rare processes that are of interest to the experiments. 
Both CMS and LHCb implement triggers that specifically select events con- 
taining two muons. The triggers of both experiments have a hardware stage, 
based on information from the calorimeter and muon systems, followed by a 
software stage, consisting of a large computing cluster that uses all the 
information from the detector, including the tracking, to make the final selec- 
tion of events to be recorded for subsequent analysis. Since CMS is designed to 
look for much heavier objects than Bey mesons, it selects events that contain 
muons with higher transverse momenta than those selected by LHCb. This 
eliminates many of the B?) decays while permitting CMS to run at a higher 
proton-proton collision rate to look for the more rare massive particles. Thus 
CMS runs at higher collision rates but with lower efficiency than LHCb for Bey 
mesons decaying to two muons. The overall sensitivity to these decays turns 
out to be similar in the two experiments. 

CMSand LHCbare not the only collaborations to have searched for B? > jut pu 

and B°-> y* yu” decays. Over three decades, a total of eleven collaborations have 
taken part in this search", as illustrated by Extended Data Fig. 7. This plot gathers 
the results from CLEO*!*, ARGUS*, UA1°”*8, CDF, L3°, D@***°, Belle*’, 
Babar®**, LHCb'”**"*” CMS'*°*? and ATLAS”. 
Analysis description. The analysis techniques used to obtain the results presented 
in this Letter are very similar to those used to obtain the individual result in each 
collaboration, described in more detail in refs 18, 19. Here only the main analysis 
steps are reviewed and the changes used in the combined analysis are highlighted. 
Data samples for this analysis were collected by the two experiments in proton- 
proton collisions at a centre-of-mass energy of 7 and 8 TeV during 2011 and 2012, 
respectively. These samples correspond to a total integrated luminosity of 25 and 3 
fb” ' for the CMS and LHCb experiments, respectively, and represent their com- 
plete data sets from the first running period of the LHC. 

The trigger criteria were slightly different between the two experiments. The 
large majority of events were triggered by requirements on one or both muons of 
the signal decay: the LHCb detector triggered on muons with transverse 
momentum pr > 1.5 GeV/c while the CMS detector, because of its geometry 
and higher instantaneous luminosity, triggered on two muons with py > 4 (3) 
GeV/c, for the leading (sub-leading) muon. 

The data analysis procedures in the two experiments follow similar strategies. 
Pairs of high-quality oppositely charged particle tracks that have one of the 
expected patterns of hits in the muon detectors are fitted to form a common vertex 
in three dimensions, which is required to be displaced from the primary inter- 
action vertex (PV) and to have a small 7” in the fit. The resulting Bey candidate is 
further required to point back to the PV, for example, to have a small impact 
parameter, consistent with zero, with respect to it. The final classification of data 
events is done in categories of the response of a multivariate discriminant (MVA) 
combining information from the kinematics and vertex topology of the events. 
The type of MVA used is a boosted decision tree (BDT)****. The branching 
fractions are then obtained by a fit to the dimuon invariant mass, m,,+,,-, of all 
categories simultaneously. 

The signals appear as peaks at the B? and B° masses in the invariant-mass 
distributions, observed over background events. One of the components of the 
background is combinatorial in nature, as it is due to the random combinations of 
genuine muons. These produce a smooth dimuon mass distribution in the vicinity 
of the B° and B° masses, estimated in the fit to the data by extrapolation from the 
sidebands of the invariant-mass distribution. In addition to the combinatorial 
background, certain specific b-hadron decays can mimic the signal or contribute 
to the background in its vicinity. In particular, the semi-leptonic decays B° > 
nm ly, Bo > K- by, and A —ppu Vv can have reconstructed masses that are 
near the signal if one of the hadrons is misidentified as a muon and is combined 
with a genuine muon. Similarly the dimuon coming from the rare B° > 2°" 
and B* — n*yu*p~ decays can also fake the signal. All these background decays, 
when reconstructed as a dimuon final state, have invariant masses that are lower 
than the masses of the B° and B° mesons, because they are missing one of the 
original decay particles. An exception is the decay A? py ¥, which can also 
populate, with a smooth mass distribution, higher-mass regions. Furthermore, 
background due to misidentified hadronic two-body decays Bey —h* h'~, where 


h =n or K, is present when both hadrons are misidentified as muons. These 
misidentified decays produce an apparent dimuon invariant-mass peak close to 
the B° mass value. Such a peak can mimic a B° > y* Ww" signal and is estimated 
from control channels and added to the fit. 

The distributions of signal in the invariant mass and in the MVA discriminant 
are derived from simulations with a detailed description of the detector response 
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for CMS and are calibrated using exclusive two-body hadronic decays in data for 
LHC. The distributions for the backgrounds are obtained from simulation with 
the exception of the combinatorial background. The latter is obtained by inter- 
polating from the data invariant-mass sidebands separately for each category, after 
the subtraction of the other background components. 

To compute the signal branching fractions, the numbers of B° and B° mesons 
that are produced, as well as the numbers of those that have decayed into a dimuon 
pair, are needed. The latter numbers are the raw results of this analysis, whereas the 
former need to be determined from measurements of one or more ‘normalization’ 
decay channels, which are abundantly produced, have an absolute branching 
fraction that is already known with good precision, and that share characteristics 
with the signals, so that their trigger and selection efficiencies do not differ sig- 
nificantly. Both experiments use the BY —> JA K* decay as a normalization 
channel with B(B* > JAp (u* w) K*) = (6.10 + 0.19) X 10 °, and LHCb also 
uses the B° > K‘ x channel with B(B° > K‘x_) = (1.96 + 0.05) X 10°. Both 
branching fraction values are taken from ref. 14. Hence, the B° > yp” branching 
fraction is expressed as a function of the number of signal events (Ngo .+ ,- ) inthe 
data normalized to the numbers of B* —> JA K* and B°-—> K*x events: 


Np y+ w- x fa €norm. 


B(B° + y= 
@ ee ) Nhorm. fe 


x Brom. =Onorm. * Np yt po (1) 
ERO wt 


where the ‘norm.’ subscript refers to either of the normalization channels. The 
values of the normalization parameter &,o;m, obtained by LHCb from the two 
normalization channels are found in good agreement and their weighted average is 
used. In this formula ¢ indicates the total event detection efficiency including 
geometrical acceptance, trigger selection, reconstruction, and analysis selection 
for the corresponding decay. The f/f, factor is the ratio of the probabilities for a b 
quark to hadronize into a B° as compared to a B° meson; the probability to 
hadronize into a B* (fu) is assumed to be equal to that into B® (fa) on the basis 
of theoretical grounds, and this assumption is checked on data. The value of f,/f, = 
3.86 + 0.22 measured by LHCb’’”** is used in this analysis. As the value of fi/f, 
depends on the kinematic range of the considered particles, which differs between 
LHCb and CMS, CMS checked this observable with the decays B?>J/ and 
Bt > TayK* within its acceptance, finding a consistent value. An additional 
systematic uncertainty of 5% was assigned to f/f, to account for the extrapolation 
of the LHCb result to the CMS acceptance. An analogous formula to that in 
equation (1) holds for the normalization of the B° > u* yw” decay, with the notable 
difference that the fj/f, factor is replaced by f/f, = 1. 

The antiparticle B° (B°) and the particle B° (B°) can both decay into two muons 
and no attempt is made in this analysis to determine whether the antiparticle or 
particle was produced (untagged method). However, the B° and B® particles are 
known to oscillate, that is to transform continuously into their antiparticles and 
vice versa. Therefore, a quantum superposition of particle and antiparticle states 
propagates in the laboratory before decaying. This superposition can be described 
by two ‘mass eigenstates’, which are symmetric and antisymmetric in the charge- 
parity (CP) quantum number, and have slightly different masses. In the SM, the 
heavy eigenstate can decay into two muons, whereas the light eigenstate cannot 
without violating the CP quantum number conservation. In BSM models, this is 
not necessarily the case. In addition to their masses, the two eigenstates of the B° 
system also differ in their lifetime values'*. The lifetimes of the light and heavy 
eigenstates are also different from the average B? lifetime, which is used by CMS 
and LHCb in the simulations of signal decays. Since the information on the 
displacement of the secondary decay with respect to the PV is used as a discrim- 
inant against combinatorial background in the analysis, the efficiency versus life- 
time has a model-dependent bias® that must be removed. This bias is estimated 
assuming SM dynamics. Owing to the smaller difference between the lifetime of its 
heavy and light mass eigenstates, no correction is required for the B° decay mode. 

Detector simulations are needed by both CMS and LHCb. CMS relies on 
simulated events to determine resolutions and trigger and reconstruction effi- 
ciencies, and to provide the signal sample for training the BDT. The dimuon 
mass resolution given by the simulation is validated using data on J/, Y, and 
Z-boson decays to two muons. The tracking and trigger efficiencies obtained 
from the simulation are checked using special control samples from data. The 
LHCb analysis is designed to minimize the impact of discrepancies between 
simulations and data. The mass resolution is measured with data. The distri- 
bution of the BDT for the signal and for the background is also calibrated with 
data using control channels and mass sidebands. The efficiency ratio for the 
trigger is also largely determined from data. The simulations are used to deter- 
mine the efficiency ratios of selection and reconstruction processes between 
signal and normalization channels. As for the overall detector simulation, each 
experiment has a team dedicated to making the simulations as complete and 
realistic as possible. The simulated data are constantly being compared to the 


actual data. Agreement between simulation and data in both experiments is 
quite good, often extending well beyond the cores of distributions. Differences 
occur because, for example, of incomplete description of the material of the 
detectors, approximations made to keep the computer time manageable, resi- 
dual uncertainties in calibration and alignment, and discrepancies or limita- 
tions in the underlying theory and experimental data used to model the 
relevant collisions and decays. Small differences between simulation and data 
that are known to have an impact on the result are treated either by reweighting 
the simulations to match the data or by assigning appropriate systematic 
uncertainties. 

Small changes are made to the analysis procedure with respect to refs 18, 19 in 
order to achieve a consistent combination between the two experiments. In the 
LHCb analysis, the A? > pur” ¥ background component, which was not included in 
the fit for the previous result but whose effect was accounted for as an additional 
systematic uncertainty, is now included in the standard fit. The following modifica- 
tions are made to the CMS analysis: the A} > py. ¥ branching fraction is updated to 
a more recent prediction® of B(A} > pp 0) =(4.94+2.19) x 1074; the phase 
space model of the decay A?—py 7 is changed to a more appropriate semi- 
leptonic decay model®; and the decay time bias correction for the B2 previously 
absent from the analysis, is now calculated and applied with a different correction 
for each category of the multivariate discriminant. 

These modifications result in changes in the individual results of each experi- 
ment. The modified CMS analysis, applied on the CMS data, yields 


B(Bo > ut w~)=(2.8%)5) x 107° and B(B° pt w~)=(4.4473) x 1071 (2) 
while the LHCb results change to 
B(Bo > ut w~) = (2.7445) x 107? and B(B° >t w~)=(3.3734) x 107! (3) 


These results are only slightly different from the published ones and are in agree- 
ment with each other. 

Simultaneous fit. The goal of the analysis presented in this Letter is to combine the 
full data sets of the two experiments to reduce the uncertainties on the branching 
fractions of the signal decays obtained from the individual determinations. A sim- 
ultaneous unbinned extended maximum likelihood fit is performed to the data of 
the two experiments, using the invariant-mass distributions of all 20 MVA discrim- 
inant categories of both experiments. The invariant-mass distributions are defined 
in the dimuon mass ranges m,,+ ,- € (4.9, 5.9] GeV/c and [4.9, 6.0] GeV/c’ for the 
CMS and LHCb experiments, respectively. The branching fractions of the signal 
decays, the hadronization fraction ratio f,/f,, and the branching fraction of the 
normalization channel B* — J/ K* are treated as common parameters. The value 
of the BT > Thy Kt branching fraction is the combination of results from five 
different experiments", taking advantage of all their data to achieve the most precise 
input parameters for this analysis. The combined fit takes advantage of the larger 
data sample and proper treatment of the correlations between the individual mea- 
surements to increase the precision and reliability of the result, respectively. 

Fit parameters, other than those of primary physics interest, whose limited 
knowledge affects the results, are called ‘nuisance parameters’. In particular, sys- 
tematic uncertainties are modelled by introducing nuisance parameters into the 
statistical model and allowing them to vary in the fit; those for which additional 
knowledge is present are constrained using Gaussian distributions. The mean and 
standard deviation of these distributions are set to the central value and uncer- 
tainty obtained either from other measurements or from control channels. The 
statistical component of the final uncertainty on the branching fractions is 
obtained by repeating the fit after fixing all of the constrained nuisance parameters 
to their best fitted values. The systematic component is then calculated by sub- 
tracting in quadrature the statistical component from the total uncertainty. In 
addition to the free fit, a two-dimensional likelihood ratio scan in the plane 
B(B° > pW) versus B(B° > u* 1) is performed. 

Feldman-Cousins confidence interval. The Feldman-Cousins likelihood ratio 
ordering procedure” is a unified frequentist method to construct single- and 
double-sided confidence intervals for parameters of a given model adapted to 
the data. It provides a natural transition between single-sided confidence intervals, 
used to define upper or lower limits, and double-sided ones. Since the single- 
experiment results'*!? showed that the B° —> *y” signal is at the edge of the 
probability region customarily used to assert statistically significant evidence for a 
result, a Feldman-Cousins procedure is performed. This allows a more reliable 
determination of the confidence interval and significance of this signal without the 
assumptions required for the use of Wilks’ theorem. In addition, a prescription for 
the treatment of nuisance parameters has to be chosen because scanning the whole 
parameter space in the presence of more than a few parameters is computationally 
too intensive. In this case the procedure described by the ATLAS and CMS Higgs 
combination group® is adopted. For each point of the space of the relevant para- 
meters, the nuisance parameters are fixed to their best value estimated by the mean 
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of a maximum likelihood fit to the data with the value of B(B° > y* " ) fixed and 
all nuisance parameters profiled with Gaussian penalties. Sampling distributions 
are constructed for each tested point of the parameter of interest by generating 
simulated experiments and performing maximum likelihood fits in which the 
Gaussian mean values of the external constraints on the nuisance parameters 
are randomized around the best-fit values for the nuisance parameters used to 
generate the simulated experiments. The sampling distribution is constructed 
from the distribution of the negative log-likelihood ratio evaluated on the simu- 
lated experiments by performing one likelihood fit in which the value of B(B° > 
i _) is free to float and another with the B(B° —> 1" 1" ) fixed to the tested point 
value. This sampling distribution is then converted to a confidence level by evalu- 
ating the fraction of simulated experiments entries with a value for the negative 
log-likelihood ratio greater than or equal to the value observed in the data for each 
tested point. The results of this procedure are shown in Extended Data Fig. 5. 
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Extended Data Figure 1 | Distribution of the dimuon invariant mass m,+,- 
in each of the 20 categories. Superimposed on the data points in black are the 
combined fit (solid blue) and its components: the B? (yellow shaded) and B° 
(light-blue shaded) signal components; the combinatorial background (dash- 
dotted green); the sum of the semi-leptonic backgrounds (dotted salmon); and 
the peaking backgrounds (dashed violet). The categories are defined by the 


Candidates / (40 MeV/c?) 
Candidates / (40 MeV/c?) 


§1 52 53 54 55 56 57 58 59 


my-,- [GeV/c?] 


§1 52 53 54 55 56 57 58 59 
2 
May [GeV/c?] 


range of BDT values for LHCb, and for CMS, by centre-of-mass energy, by the 
region of the detector in which the muons are detected, and by the range of BDT 
values. Categories for which both muons are detected in the central region of 
the CMS detector are denoted with CR, those for which at least one muon was 
detected into the forward region with FR. 
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Extended Data Figure 2 | Distribution of the dimuon invariant mass m,+,- per experiment, is shown. Superimposed on the data points in black are 


for the best six categories. Categories are ranked according to values of the combined full fit (solid blue) and its components: the B? (yellow shaded) 
SAS + B) where S and Bare the numbers of signal events expected assumingthe _ and B? (light-blue shaded) signal components; the combinatorial 
SM rates and background events under the B° peak for a given category, background (dash-dotted green); the sum of the semi-leptonic backgrounds 


respectively. The mass distribution for the six highest-ranking categories, three (dotted salmon); and the peaking backgrounds (dashed violet). 
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Extended Data Figure 3 | Schematic of the CMS detector and event display 
for a candidate Boout fe decay at CMS. a, The CMS detector and its 
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The red arched curves represent the trajectories of the muons from the B? decay 
components; see ref. 20 for details. b, A candidate Bo st ue decay produced candidate. 


©2015 Macmillan Publishers Limited. All rights reserved 


RESEARCH 
TCL, LHCb Detector 
Weight: 5,600 tonnes 
“~—P Height : 10 m 
Length: 20 m sa 
RICH1 


Vertex 
Locator 


Tracker ~ CH Mi 
Turicensis N : uon 
Dipole N ae : Chambers 


Magnet > alll eine Calorimeter 


LHCb experiment 


Run: 101412 Event: 8681643 
Date: 8 Sep 2011 Time: 16:04:18 


Extended Data Figure 4 | Schematic of the LHCb detector and event display _ detector. The proton-proton collision occurs on the left-hand side, at the origin 
for a candidate B°—>y*y~ decay at LHCb. a, The LHCb detector and its of the trajectories depicted with the orange curves. The red curves represent the 
components; see ref. 21 for details. b, A candidate B° 1+ ~ decay produced trajectories of the muons from the B? candidate decay. 
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Extended Data Figure 5 | Confidence level as a function of the B(B°—> 
pw) hypothesis. The value of 1 — CL, where CL is the confidence level 
obtained with the Feldman-Cousins procedure, as a function of B(B° > pw) 
is shown in logarithmic scale. The points mark the computed 1 — CL values and 
the curve is their spline interpolation. The dark and light (cyan) areas define the 
two-sided +10 and +20 confidence intervals for the branching fraction, while 
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B(B° > yt ur) [10%] 


the dashed horizontal line defines the confidence level for the 30 one-sided 
interval. The dashed (grey) curve shows the 1 — CL values computed from the 
one-dimensional —2AInL test statistic using Wilks’ theorem. Deviations 
between these confidence level values and those from the Feldman-Cousins 
procedure” illustrate the degree of approximation implied by the asymptotic 
assumptions inherent to Wilks’ theorem”. 
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Extended Data Figure 6 | Likelihood contours for the ratios of the 
Desicnitie fractions with respect to their SM prediction, in the se sm Versus 
Ssm Plane. a, The (black) cross marks the central value returned by the fit. 
The SM pom is shown as the (red) square located, by construction, 

atS B= =s% sm = 1. Each contour encloses a region approximately corresponding 
to the reported confidence level. The SM branching fractions are assumed 
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uncorrelated to each other, and their uncertainties are accounted for i in the 
likelihood contours. b, c, Variations of the test statistic —2AInL for Si and 
a, are shown in b and c, respectively. The SM is represented by the (red) 
vertical lines. The dark and light (cyan) areas define the +1o and +20 
confidence intervals, respectively. 
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Extended Data Figure 7 | Search for the Bo>yty™ and Boyt pw” 

decays, reported by 11 experiments spanning more than three decades, and 
by the present results. Markers without error bars denote upper limits on the 
branching fractions at 90% confidence level, while measurements are denoted 
with error bars delimiting 68% confidence intervals. The solid horizontal lines 
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represent the SM predictions for the B> y+ u~ and Bow branching 
fractions’; the blue (red) lines and markers relate to the B?> y+ p~ (Bo > 
ww) decay. Data (see key) are from refs 17, 18, 31-60; for details see Methods. 
Inset, magnified view of the last period in time. 
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Greenland supraglacial lake drainages triggered 
by hydrologically induced basal slip 


Laura A. Stevens’, Mark D. Behn?, Jeffrey J. McGuire’, Sarah B. Das’, Ian Joughin’®, Thomas Herring", 


David E. Shean*® & Matt A. King”® 


Water-driven fracture propagation beneath supraglacial lakes 
rapidly transports large volumes of surface meltwater to the base 
of the Greenland Ice Sheet’. These drainage events drive transient 
ice-sheet acceleration'~ and establish conduits for additional sur- 
face-to-bed meltwater transport for the remainder of the melt 
season’**, Although it is well established that cracks must remain 
water-filled to propagate to the bed”’, the precise mechanisms that 
initiate hydro-fracture events beneath lakes are unknown. Here we 
show that, for a lake on the western Greenland Ice Sheet, drainage 
events are preceded by a 6-12 hour period of ice-sheet uplift and/or 
enhanced basal slip. Our observations from a dense Global 
Positioning System (GPS) network allow us to determine the dis- 
tribution of meltwater at the ice-sheet bed before, during, and after 
three rapid drainages in 2011-2013, each of which generates tensile 
stresses that promote hydro-fracture beneath the lake. We hypo- 
thesize that these precursors are associated with the introduction 
of meltwater to the bed through neighbouring moulin systems 
(vertical conduits connecting the surface and base of the ice sheet). 
Our results imply that as lakes form in less crevassed, interior 
regions of the ice sheet’ *, where water at the bed is currently less 
pervasive”'*'°, the creation of new surface-to-bed conduits caused 
by lake-draining hydro-fractures may be limited. 

Greenland Ice Sheet flow accelerates at the beginning of the melt 
season””*, when surface meltwater reaches the bed via conduits’**’7"*. 
Inland from the ice margin, this process is often associated with the 
drainage of supraglacial lakes’*”’. Most supraglacial lakes drain slowly, 
overfilling their banks and routeing lake water via surficial streams 
to nearby crevasses and/or moulins*”®*'. A smaller fraction (~13%) 
of lakes drain rapidly (<1 day)’®, in some cases as rapidly as a 
few hours’, through large (kilometre-scale length) hydro-fractures 
that form directly beneath the lake basin. These hydro-fractures sub- 
sequently close except where continued stream flow keeps moulins 
open for the remainder of the melt season’. While the former style 
of drainage requires the presence of pre-existing crevasses and/or 
moulins, the latter has the potential to create new surface-to-bed 
meltwater pathways through the ice sheet, and is thus an area of 
intense Study Pr ere, 

While the basic principles of hydro-fracture through glacial ice are 
well understood’, the mechanism that triggers the formation of 
kilometre-scale length hydro-fractures in compressional basins where 
lakes form is unknown’. A necessary condition for generating 
through-ice hydro-fractures is that a supraglacial lake must contain 
a sufficient volume of water to keep a fracture filled as it propagates 
from the surface to the bed’~°. However, large lakes with volumes well 
above this threshold often do not drain over multiple summers’. 
Additionally, lakes repeatedly fill basins containing numerous healed 
hydro-fracture cracks and moulins created during drainage events in 
previous years'*’, implying that the presence of pre-existing cracks 


does not necessarily lead to immediate drainage. Thus, identifying the 
first-order control on hydro-fracture initiation preceding rapid lake 
drainages has remained elusive. 

In this study, we investigate hydro-fracture initiation and rapid 
drainage at North Lake (68.72° N, —49.50° W), an ~2.5-km diameter 
supraglacial lake located south of the Jakobshavn-Isbrz catchment on 
thick ice (~980 m) (Fig. 1). This site has been the focus of in-depth 
study since 2006, when the first detailed evidence for hydro-fracture to 
the bed of the Greenland Ice Sheet was collected using GPS measure- 
ments from North Lake base station’ (Fig. 1). During the 2006 event, a 
slow, steady lake-level drop was observed over a 16-h pre-drainage 
stage followed by the rapid (<2 h) drainage coincident with vertical 
and horizontal ice displacement’. Subsequent modelling of the North 
Lake base station data collected during this event found that vertical 
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Figure 1 | Synthetic aperture radar (SAR) image on 17 June 2011 showing 
the extent of North Lake (centre) and surrounding lakes 1 day before the 
2011 rapid North Lake drainage. Yellow triangles, GPS locations. The M1 
moulin is also shown. Image copyright DigitalGlobe. 
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uplift was caused almost entirely by a horizontal cavity opening at the 
ice-bed interface due to rapid injection of meltwater, whereas opening 
of the through-ice vertical crack was the principal contributor to the 
horizontal surface displacements”. Similar observations have since 
been made at other west Greenland Ice Sheet supraglacial lakes”, all 
providing definitive evidence for rapid meltwater drainage to the bed 
during hydro-fracture events. 

A limitation of these previous studies was insufficiently dense obser- 
vations of surface motion required to directly constrain the mech- 
anism and location of hydro-fracture initiation and the spatial 
distribution of meltwater at the ice-sheet bed. Here we present results 
from a spatially-dense array of 16 GPS stations positioned around 
North Lake between 2011 and 2013 (Fig. 1). This array captured the 
dynamic response of the ice sheet to rapid lake drainages in each of 
the 3 years of the study, allowing us to infer the evolving hydro-fracture 
geometry and spatial distribution of meltwater at the ice-sheet bed 
before, during, and after drainages. 

From these GPS data, we identify a period of precursory ice motion, 
indicative of the presence of an increased volume of water reaching 
the bed within the GPS array, hours before each year’s local hydro- 
fracture initiation and rapid lake drainage (Extended Data Table 1). 
The displacement anomalies (Fig. 2a-c) show the along-flowline, 
crack-normal, and vertical displacement histories for 2 days before 
and 1 day after each drainage event at stations NLO8 and NLO1 or 
NLO3 (Fig. 1 and Methods). We pick three time points for each drain- 
age (using all available stations) that designate the start of the pre- 
cursor, hydro-fracture initiation, and the maximum hydro-fracture 
opening (Fig. 2 and Extended Data Table 1). 

The 2011 precursor is manifested as vertical uplift followed by 
increased displacement in the flow-line direction at stations southwest 
of moulin M1 (NLO7, NL08, NL10) over the 10 h leading up to rapid 
lake drainage (Figs 1, 2a and 3a). This is consistent with field observa- 


a 2011 GPS: NLO8, NLO1 b 


2012 GPS: NLO8, NLO1 c 


tions that suggest as North Lake filled over the preceding days the 
western shoreline reached M1, allowing meltwater to begin pooling 
in and reactivating M1 before lake drainage, thus permitting increased 
basal slip (Extended Data Figs 1 and 2). Similarly, we observe increased 
displacement in the flow-line direction before the 2012 and 2013 
hydro-fracture events, which we also interpret to be hydrologically 
induced (Fig. 2b, c) (see “The 2012 North Lake drainage’ and “The 
2013 North Lake drainage’ in Methods). In 2012 the precursor is 
manifested as anomalous along-flowline displacements at stations in 
the northern end of the array (for example, NLO1 and NLO2), but 
shows little signal at the southern stations (Extended Data Fig. 4a). 
The 2013 precursor is manifested as enhanced flowline displacements 
at all western stations (FLO3, NLO4, NLO7, NLO8, NL10), as well as 
vertical uplift focused just west of the lake basin (Extended Data 
Fig. 5a). All three precursors have similar durations (6-12 h), but they 
occur in different subsets of the spatial array. After each precursor 
there is clear evidence of the main 4-km long hydro-fracture opening 
and subsequent rapid lake drainage, as indicated by the ~3-h long, 
5-10 cm excursion of NLO8 in the crack-normal (southward) dir- 
ection, which is rapidly recovered owing to closing of the fracture 
(Fig. 2a-c). In all 3 years the hydro-fracture opening phase is accom- 
panied by considerable (>20 cm) uplift and enhanced along-flowline 
motion across many stations in the network. 

To quantitatively constrain the processes responsible for these surface 
motions, we exploit the high spatial density of our GPS data to invert for 
the space-time history of deformation surrounding the lake drainages. 
We use the Network Inversion Filter (NIF) algorithm” to model the 
GPS time series of drainage-related surface motion as the summation of 
three deformation sources: (1) hydro-fracture opening, (2) basal cavity 
opening (due to the rapid injection of meltwater), and (3) extra basal slip 
above the background rate (due to enhanced basal lubrication) 
(Methods) (Extended Data Figs 6-8). The NIF utilizes Green’s functions 
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Figure 2 | The 2011, 2012, and 2013 North Lake drainages. GPS station 
displacement less background velocities is shown in solid (dashed) lines for 
station NLO8 (NLO1 or NLO3) flowline displacement (blue), crack-normal 
displacement (black), and relative vertical uplift (red) over the 2 days before and 
1 day after the (a) 2011, (b) 2012, and (c) 2013 drainage events. The bottom row 
shows NIF-derived hydro-fracture opening volume (black), basal cavity 
opening volume (red), and basal slip moment (blue) across the domain for the 
2 days before and 1 day after the (d) 2011, (e) 2012, and (f) 2013 drainage 
events. The coordinate system is orientated such that hydro-fracture opening is 
expressed primarily in the horizontal crack-normal component, while basal slip 
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is primarily expressed in the horizontal flowline component, and basal cavity 
opening is primarily reflected in the vertical component data. The precursor 
and rapid lake drainage periods are designated by three time points across the 
drainages: (1) the start of the precursor at the time of first distinguishable 
deviation of station vertical uplift, crack-normal, or flowline displacement from 
the background velocity field (‘1. Start of precursor’); (2) hydro-fracture 
initiation at the time of maximum NL08 southward crack-normal acceleration 
(2. H-F initiation’); and, (3) the maximum hydro-fracture opening at the time 
of maximum southward NL08 crack-normal displacement (*3. Max H-F 
opening’) (Extended Data Table 1). 
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Figure 3 | The 2011 basal slip and cavity opening at hydro-fracture 
initiation and maximum hydro-fracture opening. NIF-calculated (a) extra 
basal slip accumulated, (b) basal cavity opening, and (c) hydro-fracture 

crack opening at the time of the 2011 (a-c) hydro-fracture initiation and 
(d-f) maximum hydro-fracture opening (time points shown in Fig. 2a). Moulin 
location, last known lake shoreline, GPS stations, and NIF vertical crack surface 
trace derived from SAR imagery are shown as a yellow circle, blue line, black 
triangles, and black line, respectively. Vector fields show GPS (NIF) 
displacement less background velocities in black (green) for (a) the period 
between the start of the precursor and hydro-fracture initiation, and (d) the 
period between hydro-fracture initiation and maximum hydro-fracture opening. 
Error ellipses of 1 sigma are shown for the GPS displacements (blue ellipses). 
Basal sub-elements are 0.83 km by 0.83 km, resulting in 144 sub-elements over a 
10 km X 10 km region. DOY, day of year (DOY 169 in 2011 was 18 June). 


for an elastic half-space to relate the surface displacement time series to 
the space-time history of opening and slip along prescribed planes 
describing each of these deformation sources™* (Methods). 

The NIF results provide estimates of the spatial distribution of melt- 
water at the ice-sheet bed before, during, and after drainages. Full inver- 
sion results for the 2011-2013 drainages are presented as videos 
(Supplementary Videos 1-3). Figure 2d-f shows spatially integrated 
results from 2011 to 2013 for the three deformation sources: hydro- 
fracture crack volume, basal cavity volume, and basal slip (shown as 
moment Mo; see Methods). For the 2011 event, we identify basal cavity 
opening and slip associated with the precursor southeast and southwest 
of M1, respectively, before the hydro-fracture opens (Figs 2d and 
3a-c)—indicating the injection of meltwater at the ice-sheet bed before 
local hydro-fracture initiation. Immediately after the precursor, the 
hydro-fracture opens first at M1, and then propagates east beneath the 
basin (Supplementary Video 1). At the time of maximum hydro-fracture 
opening (Fig. 3d-f), the basal cavity volume (Figs 2d) is nearly equivalent 
to the North Lake pre-drainage volume estimate of 0.007 + 0.001 km* 
(Methods) (Extended Data Table 1 and Extended Data Figs 2 and 3). The 
agreement between lake volume calculations and NIF estimates of basal 
cavity volume validates our inversion results. Basal slip is focused within, 
and a few kilometres south of, the lake basin, but not northwards, pos- 
sibly because of a known ridge in the basal topography (Fig. 3d)°”°. 

Inversion results for the rapid drainage events in 2012 and 2013 also 
suggest precursory activity (see “The 2012 North Lake drainage’ and 
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‘The 2013 North Lake drainage’ in Methods). The 2012 precursor is 
associated with basal slip 3 km north of the hydro-fracture, possibly 
because of enhanced lubrication from nearby meltwater input to the 
bed (Extended Data Fig. 4a). The 2013 precursor was the most extens- 
ive, producing enhanced basal slip over a 5 km X 5 km area as well as 
significant basal uplift (Extended Data Fig. 5a, b). In both 2012 and 
2013, the hydro-fracture opening and lake drainage produce ~50 cm 
of basal cavity opening beneath the lake basin and enhanced basal slip 
over a wide area (Extended Data Figs 4d-f and 5d-f). 

Previous work on the 2006 North Lake drainage event identified a 
slow steady drop in lake level in the 16 h before the rapid hydro-fracture 
induced drainage, and it was hypothesized that this pre-drainage may 
have been due to the initial filling of a slowly propagating hydro-fracture 
directly beneath the lake basin, or water over-spilling into an adjacent 
crack system’. However, the observations in 2006 were insufficient to 
distinguish between these (or alternative) mechanisms. Our NIF 
results show no evidence for the slow downward propagation of a 
hydro-fracture before lake drainage, which would be manifest as 
crack-normal horizontal displacements. Rather, the inversions clearly 
demonstrate that each drainage is preceded by a period of enhanced 
basal slip and/or uplift, which is probably caused by the injection of 
meltwater at the bed via neighbouring hydro-fractures and moulins. 
Intriguingly, precursor motion was also observed before other lake 
drainages’, although it was not identified as a triggering mechanism 
for hydro-fracture initiation (see ‘Precursors observed in previous stud- 
ies’ in Methods). The observation of a precursor before rapid lake 
drainages strongly suggests that they play an important role in trigger- 
ing hydro-fractures, possibly by inducing local stress perturbations that 
overcome the background compressive stresses found in lake basins°”*. 

To test the hypothesis that local stress perturbations play an import- 
ant role in triggering hydro-fractures, we compared the background 
viscous stresses in the lake basin with the elastic stress change induced 
by the precursory basal slip and cavity opening (Methods). We calcu- 
lated background compressive stresses of order —70 + 40 kPa within 
the lake basin, comparable to other west Greenland lake basin esti- 
mates”. Before the start of the precursor, changes in crack-normal 
stress (Ao,) on the hydro-fracture are Ao, = 0 + 40 kPa. However, 
throughout the precursor Ao, increases, attaining maxima tensile 
stresses of +100 to +600 kPa at the top of the hydro-fracture at the 
onset of rapid drainage (Fig. 4a—c). These calculations confirm that the 
drainage precursors can generate tensile crack-normal stresses near 
the surface with sufficient magnitude to temporarily overcome the 
compressive background stress and promote hydro-fracture initiation. 

Our results and reinterpretation of previous studies (see ‘Precursors 
observed in previous studies’ in Methods) indicate that injection of 
surface meltwater, routed from supraglacial lakes to the bed through 
pre-existing crevasses or conduits, is required to trigger hydro-fracture 
initiation and subsequent rapid lake drainage in an otherwise com- 
pressional basin (Extended Data Fig. 9a—c). As shown previously, a 
necessary condition for hydro-fracture propagation is a sufficient 
volume of water to keep the fracture filled”’. Lacking a known trigger- 
ing mechanism, previous studies used only this volume threshold to 
predict that lake drainages would occur’*”””*. However, we do not find 
that lakes spontaneously hydro-fracture once they surpass this thresh- 
old’. In all 3 years, North Lake contained approximately five times the 
critical volume of water necessary to keep a 4-km long crack open to 
the bed (Methods) before hydro-fracture occurred. Thus, we argue 
against the exclusive use of a volume threshold for triggering supra- 
glacial lake drainage in regional ice-sheet modelling studies. 

We hypothesize that if stress transients associated with enhanced 
meltwater transport to the bed beneath lakes are required to initiate 
surface-to-bed hydro-fractures in compressional lake basins, then 
lakes are less likely to create large-scale hydro-fractures in interior 
regions of the Greenland Ice Sheet where meltwater access to the 
bed is limited by lack of pre-existing crevasses*'*"°”’. As new lakes 
form at higher elevations in a warming climate’”* they will encounter 


4 JUNE 2015 | VOL 522 | NATURE | 75 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 2011 Hydro-fracture initiation less start of precursor 


€ 

x“ 

a 

a 

3 4 + 
14 | _ 

c 2013 Hydro- aniie initiation less start of precursor 

= | 7 ane L | oe 

E = 

os 1. 

a 

© + 

(a) 


0 
Along strike (km) 


-500 -400 -300 -200 -100 0 100 200 300 400 500 


Ao, (kPa) 


Figure 4 | Change in crack-normal stress during the precursor. Changes in 
the crack-normal elastic stresses (Ag,,) (in kilopascals) (compressive, negative; 
tensile, positive) on the hydro-fracture crack as a result of basal cavity opening 
and accumulated extra basal slip during the (a) 2011, (b) 2012, and (c) 2013 
precursor. Stresses are calculated at the start of the precursor and hydro- 
fracture initiation, coinciding with the times noted in Extended Data Table 1, 
and then differenced to show the change in elastic stress that occurs during 
the precursor. 


longer-wavelength surface topography***”’, resulting in greater dis- 


tances between compressive lake basins and extensional crevasse- 
forming regions. Thus, lake water must be routed greater distances 
in surface streams down the ice sheet before encountering crevasses 
where through-ice drainage conduits can be established, minimizing 
local stress transients and potentially obstructing in situ rapid drainage 
of high-elevation lakes and the formation of new surface-to-bed 
hydro-fractures beneath lake basins”. This indicates that although lake 
drainages may be important for inland expansion of enhanced flow at 
mid-elevations’®, such expansions are more probably influenced by 
longitudinal coupling at high elevations” (see ‘Implications for inland 
expansion of seasonal acceleration’ in Methods). Finally, the supply of 
meltwater to the bed may not be well correlated with the location, 
abundance, and size of high-elevation supraglacial lakes. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

GPS data. Continuous 30-s resolution GPS data collected by dual-frequency 
Trimble NetR9 receivers were processed with Track software*’. GPS data for each 
station were processed individually relative to the 30-s resolution Greenland GPS 
Network (GNET) KAGA base station located on bedrock ~55 km from North 
Lake**. The 30-s resolution position estimates and corresponding uncertainties 
from Track were used in the NIF and plotted in Extended Data Figs 6 and 7 and 
Supplementary Videos 1-3. For plotting purposes, the data in Fig. 2 were 
smoothed over a 2-min window with a five-point central moving average. Error 
output from Track software is given as 1-sigma errors for the 30-s resolution east, 
north, and up offsets from the coordinates of the first position in the time series, 
but not the full covariance matrix’. Horizontal (vertical) 1-sigma errors are con- 
sistently +2 cm (+5 cm) across all stations and years. 

Of the 16 GPS stations in the North Lake array, stations NL08, NLO1, and/or 

NLO3 best capture differences between the precursors over the three drainage 
events (Fig. 2). NLO8 is consistently the most responsive station during lake 
drainage events, and proves to be the best single station indicator of the drain- 
age event as a whole. NLO8 captures the 2011 and 2013 precursor well (Fig. 2a, 
c). In 2012, the precursor is manifested as anomalous along-flowline displace- 
ments observed at stations NLO1 and NLO2. Thus, we show of these northern 
stations (NLO1) alongside the NLO8 time series to show the along flowline speed 
up in the northern portion of the array during the 2012 precursor (Fig. 2b). In 
2013, NLO1 and NLO2 stations were not recording during the drainage, leaving 
NLO3 as the closest station in the northern portion of the array to NLO1/NL02 
(Fig. 2c). Lake drainage duration is calculated on the basis of NLO8 crack- 
normal motion from the start of the hydro-fracture opening to when NL08 
crack-normal motion regains its southward displacement as the crack closes 
(~1-2 h after time of maximum hydro-fracture opening) (Fig. 2 and Extended 
Data Table 1). 
Lake volume. The NASA Ames Stereo Pipeline*’ stereographic software was used 
to generate ~2 m per pixel digital elevation models (DEMs) of the empty, post- 
drainage North Lake basin using a WorldView-1 stereopair acquired on 21 July 
2011 and a WorldView-2 stereopair acquired on 5 July 2013 (Extended Data 
Fig. 2). Orthorectified ~0.5 m per pixel WorldView images depicting the last 
available pre-drainage North Lake shoreline were used to constrain the lake shore- 
line position and, thus, lake depth and volume from the DEM. The 2011 lower 
bound on the lake volume estimate for North Lake was calculated from the 
shoreline position on a WorldView-1 image taken on 17 June 2011 obtained 1 
day before the 2011 drainage. The 2013 lower bound for lake volume for North 
Lake was calculated from the shoreline position on a WorldView-2 image taken on 
17 June 2013 obtained 2 days before the 2013 drainage. While small-scale surface 
features (moulins, supraglacial stream channels) advect ~100 m yr’ to the west- 
northwest, the North Lake basin geometry is the result of fixed bed topography, 
and does not change significantly between summers (Extended Data Fig. 3). Lake 
volume estimates are given in Extended Data Table 1. 

In 2012, a lack of satellite images of North Lake basin during the days leading up 

to drainage prevented lake volume calculation via shoreline position and DEM 
methods. Output from the Regional Atmospheric Climate Model for the 
Greenland Ice Sheet (RACMO2/GR)** for 2011 and 2012 was used to compare 
estimated cumulated runoff in the North Lake region (68.66° N, —49.52° W) at 
the day of lake drainage between the 2 years. We found that RACMO2/GR 
values of cumulative runoff at 18 June 2011 and 9 June 2012 are very similar at 
0.0030 kg m~ and 0.0031 kg m~’, respectively. Average daily runoff values at this 
location during mid-June are of the order 0.0003 kg m~* d'. Thus, we conclude 
the pre-drainage 2012 North Lake volume is of the order of the pre-drainage 2011 
North Lake volume: 0.007 + 0.001 km*. We hypothesize that the pre-drainage 
2012 North Lake shoreline reached M1 at the time of drainage. 
NIF. We implemented the NIF algorithm” to determine the amount of open- 
ing along a vertical crack and slip and opening along a basal crack during the 
2011, 2012, and 2013 North Lake rapid drainage events. The NIF utilizes 
Green’s functions for an elastic half-space” to relate surface displacement time 
series to the space-time history of opening and slip along prescribed planes. 
The North Lake basin is modelled using an isotropic elastic half-space with the 
GPS stations at the surface. Three deformation sources are included: (1) 
hydro-fracture opening, (2) basal cavity opening (due to the rapid injection 
of meltwater), and (3) extra basal slip above the background rate (due to 
enhanced basal lubrication). The NIF assumes linear elastic behaviour for 
the ice sheet, and treats hydro-fracture as a horizontal elastic dislocation along 
a vertical crack within the ice*. These assumptions are justified by vertical- 
crack propagation timescales (seconds to minutes) that are shorter than the 
Maxwell time of ice (6-24 h)”????3, 
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We model the GPS position vector X for each GPS station i, as a function of time 
t relative to the starting time fp as follows**”*: 


X;(t) —Xi(to) — Vi(t —t) = Gis(t) + Li(t) + Ff (t) +e(t) (1) 


where the left-hand side represents the drainage-related surface motion of the GPS 
stations obtained by removing the station background velocity field** V. V is 
determined for each station by calculating station velocity over the 2 days of data 
available before the start of the precursor. On the right-hand side of equation (1), G 
represents the matrix of elastic Green’s functions”, s(t) is a vector of slip (or 
opening) on each deformation plane subfault at time f, L,(t) is component-specific 
coloured noise, Ff(t) represents reference frame errors** at time t, and é(¢) repre- 
sents normally distributed white-noise observation error at time t. We model L,(t) 
with a Brownian random walk model as has been done in previous studies of high- 
rate GPS data®’. This term is necessary to absorb coloured noise in the time series 
due to unmodelled errors in the position estimates and possibly local benchmark 
instabilities. The random walk is described by a scale parameter 1, which we 
estimated to be 5 cm d~ ” by modelling data before the start of the precursor as 
a combination of a background velocity and random walk (5 cm d~” was the 
smallest value that resulted in white residuals for such a model). We use three 
perpendicular translations for Ff (t) because of the small size of our network’*. The 
data vector in the Kalman filter is given by the GPS position data X,(t) minus the 
background velocity (D;(t) = X;(t) —X;i(t.)— Vi(t—t)). The data covariance 
matrix is assumed to be diagonal and derived from the individual component 
errors from the Track processing modified appropriately given the uncertainty in 
our estimate of V (ref. 38). 

The vertical plane for the hydro-fracture extends from 100 to 1,100 m depth 
striking along the surface expression of the most substantial recurring hydro- 
fracture crack intersecting M1 on the western edge of the lake basin (Extended 
Data Fig. 1). The vertical plane does not start at the surface of the elastic half-space 
because the Green’s Functions used in the NIF algorithm are for dislocations 
within the halfspace*’. This approximation is sufficient because the vertical crack 
is located within the lake and our GPS stations are located more than 1 km outside 
the lake basin and, thus, are not sensitive to the shallowest 100 m of crack opening. 
On the vertical plane we solve only for mode-I tensile motion corresponding to 
opening of the crack". The vertical plane is subdivided into 24 subfaults 
along strike and 6 subfaults along dip; each vertical subfault is 0.19 km wide 
and 0.16 km tall. 

The basal plane is defined as a 100-km” sub-horizontal plane at 1,100 m depth, 
dipping 0.01° to the west, and centred beneath the North Lake basin (68.723° N, 
—49.53° W). The basal plane strikes 186° from north, perpendicular to the dir- 
ection of average ice velocity as determined from the average of all GPS station 
velocities in the days leading up to each year’s drainage event. We estimate both 
mode-I tensile and dip-slip motion in the direction of ice flow on the basal plane. 
The basal plane is subdivided into 12 subfaults along dip and 12 subfaults along 
strike; each subfault is a 0.83 km X 0.83 km square. The shallow depth of the basal 
plane within the half-space results in Green’s function magnitudes above 0.95 for 
the uplift response at GPS stations to basal plane opening”. Therefore, we neglect 
the material property contrast at the ice sheet-bedrock interface in our model 
because it would only modify the Green’s function magnitude by a few per cent. 
The geometry of our array, with a 10:1 ratio of horizontal distance across the GPS 
array to ice thickness, allows us to resolve slip and opening on the basal plane on 
the length scale of the station spacing (1-3 km). 

Our choice of a nearly horizontal basal plane is motivated by the presence of a 
relatively flat basin in the bedrock topography centred directly beneath North 
Lake*”*, which yields a nearly horizontal basal slope across the entire GPS array. 
Moreover, sensitivity tests show that our NIF results are robust for basal plane dips 
up to 5°. This value is greater than the maximum bedrock slope (3.4°) measured 
from the bedrock basin centre beneath North Lake to the bedrock ridge 5 km to the 
west of the lake basin®”*. Thus, we find no reason to add complexity associated 
with small variations in basal topography and/or to correct the GPS displacements 
for the vertical distance gained as the stations move up and out of the basin. 

We used a maximum likelihood estimation (MLE) algorithm to determine 
appropriate ranges of values for the spatial (y) and temporal (~) hyperparameters 
for both the vertical and basal planes. We determined the final hyperparameter 
values on the basis of a combination of MLE estimates and analysis of NIF output 
to identify hyperparameter values low enough to provide significant model 
smoothing, but high enough still to track station displacements during the few 
hours of rapid drainage in the time series. The MLE estimates provide an average 
value that is appropriate over the entire time series, but, therefore, oversmoothes 
the periods of rapid deformation when higher values of « and y are warranted by 
the data. The MLE calculations of the vertical-plane temporal hyperparameter 
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suggest values of « of 100 (2011), 200 (2012), and 1,000 (2013); however, slightly 
higher or lower « values of 150 (2011), 250 (2012), and 500 (2013) were used on the 
basis of NIF ability to track station displacements during the rapid drainage 
(Extended Data Fig. 8). The vertical-plane spatial smoothing parameter, 7, could 
not be constrained on the basis of MLE calculations. The MLE calculations suggest 
a higher than necessary value of the spatial hyperparameter for the vertical crack, 
resulting in unrealistic vertical plane opening and closing on spatial scales of <0.5 
km along strike. Therefore, we set the vertical plane spatial parameter to y = 450 
for the 2011 and 2012 inversions, resulting in vertical plane opening and closing on 
scales of 1 km along strike. 

The basal plane spatial and temporal hyperparameters were also not satisfact- 
orily constrained by MLE calculations for 2011 and 2012. The MLE calculations 
recommended higher than necessary spatial and temporal hyperparameters, 
resulting in unrealistic, over-fit solutions for the basal plane. The chosen basal 
plane temporal parameter (x = 25) is substantially lower than the temporal 
parameter of the vertical plane, resulting in a smoother solution of bed opening 
and slip. The chosen basal plane spatial parameter (y = 50) resolves basal slip and 
opening on spatial scales of 2 km, consistent with our 1-3 km GPS station spacing 
on the ice-sheet surface (Fig. 1). For 2013, the rapid oscillatory variations in the 
crack-normal component of displacement (north) for several stations required a 
larger basal-plane spatial parameter (y = 500) to allow the migration of sufficiently 
compact slip patches needed to fit the oscillations in the crack-normal data 
(Supplementary Discussion: 2013 North Lake Drainage). 

Basal moment calculations. Basal slip moment Mo, in newton metres, is calcu- 
lated to provide an integrated measure of slip across the basal plane: 


My =pAD (2) 


where ju, the shear modulus for glacial ice, is taken to be 3.5 GPa (ref. 41), A is the 
area of the basal plane in square metres, and D is the mean bed slip across the basal 
plane just after drainage in metres (Extended Data Table 1). Moment magnitude 
(My) is calculated from the basal slip moment’: My =”; log(Mo) —6.05 
(Extended Data Table 1). 

Critical volume for driving water-filled hydro-fracture to bed. The critical 
volume of water necessary to keep a 4-km long crack open to the bed ranges from 
to 0.0008 to 0.0020 km’. This estimate is derived on the basis of a mean crack 
opening of 0.2-0.5 m required to drive a 4-km long, 100% water-filled vertical 
crack through 1 km of glacial ice with a shear modulus of 1.5-3.9 GPa (ref. 7). 
North Lake basin stresses. To calculate background viscous stresses in North 
Lake basin, we use Glen’s flow law** to convert longitudinal (along flow) surface 
strain rates derived from TerraSAR-X 2009-2011 winter velocity measurements” 
to longitudinal stresses, jx: 


Ok = Atala tening (3) 


where the creep parameter, A, is 3.5 X 10°51 Pa 3 (ref. 44), n = 3 is the creep 
exponent, é, is the two-dimensional effective strain rate, and ee is longitudinal 
strain rate. We use the same Green’s functions from the NIF’’ to calculate the 
change in crack-normal stress (Ao,,) on the hydro-fracture that was induced by the 
basal cavity opening and accumulated extra basal slip during the 2 days leading up 
to each drainage event. 

Data. Source data for all figures and videos are available in the online version of the 
paper as Microsoft Excel spreadsheets. 

Precursors observed in previous studies. Although our study is the first to 
interpret the cause and significance of precursors to rapid lake drainage, similar 
precursors have been observed before other recorded rapid supraglacial lake drai- 
nages on the western margin of the Greenland Ice Sheet in the form of GPS station 
uplift and steady lake level lowering in the hours before hydro-fracture’*. During a 
rapid North Lake drainage in 2006, the North Lake Base Station GPS station uplift 
and steady lake level lowering was observed before rapid lake level drop and 
northward ice motion indicative of hydro-fracture opening’. Slow lake level low- 
ering was also observed before a 2008 rapid drainage of South Lake (68.58° N, 
49.39° W), another lake site in this region located 20 km south of North Lake’. 
During the rapid drainage of Lake F (67.01° N, 48.74° W) in 2010, uplift of two 
GPS stations on the eastern side of the lake was observed over the 7 h leading up to 
rapid drainage’. Precursory motion was not observed during the rapid drainage of 
Lake Ponting in 2011 (69.57° N, 49.81° W); however, the four GPS stations used 
to record ice motion may have been located too far from the lake to record 
precursory motion’. The three rapid drainage precursors observed during 2011- 
2013 at North Lake allow us to reinterpret precursors of past rapid lake drainages 
as evidence of a hydrologically induced trigger for hydro-fracture. Further, our 
results provide a possible mechanism by which a lake drainage could generate 
a meltwater pulse that could trigger additional lake drainages in the vicinity. 


Such regionally clustered lake drainages have been noted in previous lake 
drainage studies””. 

Implications for inland expansion of seasonal acceleration. The formation and 
drainage of high-elevation lakes has been invoked to explain the inland expansion 
of seasonal acceleration (enhanced summer velocities up to 8% above winter 
velocities) during high melt summers now’ and in the future'*. The precursors 
observed here suggest enhanced meltwater transport to the bed beneath lakes is 
needed to generate tensile stress transients that promote the initiation of surface- 
to-bed hydro-fractures. For this proposed hydro-fracture initiation mechanism, 
there must exist both a sufficient reservoir of surface meltwater and a nearby 
surface-to-bed pathway to transport the meltwater to the bed beneath the lake. 
Our results inform our hypothesis that rapid lake drainages are unlikely to pro- 
gress inland to areas of new surface melting based on the overall decline in tensile 
strain rates towards the ice sheet interior, which results in increasingly rare cre- 
vasses with elevation”’. Thus, much of the new surface melt in the interior probably 
drains via long (tens of kilometres) supraglacial streams that eventually terminate 
in moulins in regions where surface meltwater already reaches the bed at present”. 
While surface melt may continue to expand inland, much of this meltwater will 
only reach the bed in areas further downstream where seasonal lubrication already 
occurs'*!°”°, An alternative explanation for the observed seasonal acceleration is 
longitudinal coupling of these higher elevations (above ~ 1600 m above sea level) 
to lower elevation regions (below ~1600 m above sea level) that are responding to 
increased melt input*?***“*. 

The 2012 North Lake drainage. In 2012, North Lake drained rapidly over a 
period of ~5 h beginning at 22:12 local time on 9 June 2012. Owing to the lack 
of satellite images bracketing the North Lake drainage window, the maximum 
volume of North Lake in 2012 is unknown before drainage. Using RACMO runoff 
estimates, we conclude that the 2012 North Lake volume was similar to the 2011 
volume based on a difference of +0.00012 kg m~* of cumulative runoff between 
the 2 years (see ‘Lake volume’). 

Regional basal slip before the 2012 North Lake hydro-fracture initiation indi- 

cates the presence of increased basal meltwater at the ice-sheet bed before hydro- 
fracture. Over the 16 h leading up to North Lake hydro-fracture initiation, stations 
NLO1 and NLO2 experienced an additional 5 cm of flowline-parallel displacement 
(Fig. 2b and Extended Data Fig. 4a). At the end of the precursor, slip in the 
northern portion of the array resulted in a basal moment (Mo) of 10’ N m 
(Fig. 2e and Extended Data Fig. 4a), although there was minimal basal cavity 
opening throughout the array (Extended Data Fig. 4b). Over the following 3.5 
h, the hydro-fracture opened beneath the North Lake basin, reaching its maximum 
width 2.5 h after hydro-fracture initiation. As in the 2011 North Lake rapid 
drainage, basal cavity opening was centred beneath and to the south of the 
North Lake basin during the 2012 drainage (Extended Data Fig. 4e and 
Supplementary Video 2). During lake drainage, basal slip beneath and to the 
southwest of the North Lake basin occured, while the basal slip initially accumu- 
lated to the north during the precursor (Extended Data Fig. 4a) remained and 
expanded south (Extended Data Fig. 4d). 
The 2013 North Lake drainage. In 2013, GPS station attrition in the eastern half 
of the array precluded various array-scale NIF conclusions; however, precursory 
activity in the western half of the array was well resolved (Extended Data Fig. 5). In 
2013, North Lake and a small lake (Small Lake; Extended Data Fig. 5) 2 km to the 
southwest of North Lake may have drained concurrently or in sequence. Positive 
crack-normal (approximately northward) motion during the North Lake hydro- 
fracture was observed at the three GPS stations located between Small Lake and 
North Lake (NL08, NLO7, NL10). This can be seen in the positive crack-normal 
excursion of 0.1 m at station NL08, 0.25 days after the 2013 maximum North Lake 
hydro-fracture crack opening (Fig. 2c). 

Available imagery does not capture the precise timings of the lakes’ drainages. 
The last WorldView image obtained on 17 June 2013 (2 days before the 2013 
North Lake drainage) shows a filled North Lake and Small Lake. The volume of 
North Lake (Small Lake) was at least 0.0036 + 0.001 km? (0.0021 + 0.001 km’) at 
the time of drainage based on the shoreline positions obtained 2 days before the 
North Lake drainage event (Extended Data Figs 2d and 3b). The first post-drainage 
WorldView image available on 5 July 2013 (16 days after North Lake drainage; 
Extended Data Fig. 2e) shows an empty North Lake and Small Lake, with a bright, 
linear crack running through the South Lake basin that could be the 2013 Small 
Lake hydro-fracture. July 2014 field surveys confirmed the existence and east-west 
strike of the Small Lake hydro-fracture. 

In an attempt to distinguish 2013 North Lake and Small Lake hydro-fracture 
events, a NIF including an additional source of displacement as a vertical plane 
with tensile opening along the South Lake hydro-fracture crack trace was 
developed and run with the 2013 GPS data. The NIF with the additional Small 
Lake hydro-fracture (‘four-source’) did not accurately capture opening and 
closing along the Small Lake hydro-fracture. While including the Small Lake 
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hydro-fracture more completely fits NLO7, NL08, and NL10 station motion during 
the 1.5 h following the North Lake rapid drainage, the NIF results for the Small 
Lake hydro-fracture exhibit unrealistic behaviour by continuing to widen 
throughout the day after North Lake drainage. We attribute this result to the 
non-uniqueness inherent in the inversion owing to the lack of stations between 
North Lake and Small Lake. Because the NIF results including the Small Lake 
source produce a physically unlikely result (a crack that continues widening after 
drainage) we favour the alternative solution, which fits the data equally well— 
namely no Small Lake hydro-fracture and a rougher distribution of basal slip/ 
opening (y = 500) that accounts for the crack-normal component oscillations at 
NLO7 and NLO8 via the spatial propagation of the basal slip patch (Fig. 2c). 

For a NIF that does not include the additional Small Lake vertical plane (‘three- 
source’), individual station motion can be mapped onto the three original 
deformation sources (vertical opening, basal cavity opening, and basal slip) with 
a highly spatially resolved basal plane (y = 500) and associated highly temporally 
resolved vertical plane (x = 500). The three-source 2013 NIF yields realistic 
opening and closing North Lake hydro-fracture behaviour. We present the 
three-source 2013 NIF results here, since we cannot sufficiently distinguish 
between the North Lake and Small Lake drainages from the available station 
spatial density. 

Independent of the NIF setup (three or four sources of displacement), precurs- 
ory activity in the western half of the array is well resolved in the GPS data. North 
Lake drained rapidly over a period of ~5 h beginning at 15:00 local time on 19 June 
2013 (Fig. 2c, f). From analysis of WorldView imagery, the 2013 west North Lake 
shoreline had not reached M1 2 days before the drainage event (Extended Data 
Fig. 3b), although, in the absence of a snow-dam, water could have reached the 
moulin via a deeply incised surface meltwater channel (Extended Data Fig. 2d). 
During the 16 h leading up to hydro-fracture initiation, flowline parallel speed-up 
of western stations (Fig. 2c) generated considerable Mo (Fig. 2f), and was coincid- 
ent with a basal cavity opening of ~0.002 km’ beneath the North Lake and Small 
Lake basins (Fig. 2f and Extended Data Fig. 5a—c). Inversion results suggest that 
hydro-fracture opening began in the region of M1; however, opening along the 
eastern portion of the vertical plane was not well constrained owing to a lack of 
GPS stations to the immediate northeast of North Lake basin (for example, NL0S5, 
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NLO06, and North Lake base station). As in previous years, during the North Lake 
rapid drainage, basal cavity opening occured beneath the lake basin, while extra 
basal slip extended further afield (Extended Data Fig. 5d-f and Supplementary 
Video 3). A ground survey of North Lake basin a month after the 2013 North Lake 
drainage identified post-drainage supraglacial meltwater routeing through M2 
(Extended Data Fig. 1). 
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Extended Data Figure 1 | WorldView image taken on 21 July 2011 of an by black arrows). Yellow triangles mark GPS stations within the map area. 
empty North Lake basin after the 2011 rapid drainage event. Yellow outline Image copyright 2015 DigitalGlobe. 
shows M1 and M2 location along the hydro-fracture trace (endpoints marked 
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a. North Lake 6/17/2011 b. North Lake 7/21/2011 c. North Lake DEM 7/21/2011 


f. North Lake DEM 7/5/2013 


68.715 
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Extended Data Figure 2 Images and DEM of 2011 and 2013 North Lake North Lake DEM. c (f), The 2-m horizontal resolution DEM (2-m vertical 
basin. a (d), WorldView image chosen to map the 2011 (2013) North Lake pre- _ contours in black) for the North Lake region, with the North Lake shoreline 
drainage shoreline position. b (e), WorldView image of an empty North Lake (red), M1 (yellow), and hydro-fracture trace (blue) mapped over contours. 
basin obtained on 21 July 2011 (5 July 2013) used to create the 2011 (2013) Images copyright 2015 DigitalGlobe. 
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Extended Data Figure 3 | North Lake depths in 2011 and 2013. Two-metre __ the event on 17 June 2013. Filling the empty basin DEM up to the greatest 
resolution DEMs were created from the first available post-drainage World known pre-drainage shoreline extent generated North Lake depths (1-m 
View stereo pair obtained of the region in (a) 2011 and (b) 2013. Shoreline vertical contours in black) in relation to the greatest known pre-drainage 
positions from 2011 and 2013 derived from last pre-drainage WorldView or _ shoreline extents and were used to calculate minimum 2011 and 2013 North 
TerraSAR-X images obtained over the region are shown in red. The last pre- Lake pre-drainage volumes. The trace of the vertical hydro-fracture crack is 


drainage WorldView image for 2011 occured 2 days before the drainage event shown in grey; M1 is outlined in yellow. 
on 17 June 2011; the last pre-drainage SAR image for 2013 occured 1 day before 
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a 2012 Hydro-fracture Initiation DOY 161.72 d 2012 Max Hydro-fracture Opening DOY 161.85 
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Extended Data Figure 4 | The 2012 basal slip and cavity opening at hydro- _ black triangles, and black line, respectively. Vector fields show GPS (NIF) 
fracture initiation and maximum hydro-fracture opening. NIF-calculated _— displacement less background velocities in black (green) for (a) the period 


(a) extra basal slip accumulated, (b) basal cavity opening, and (c) hydro- between the start of the precursor and hydro-fracture initiation, and (d) the 
fracture crack opening at the time of the 2012 (a—c) hydro-fracture initiation period between hydro-fracture initiation and maximum hydro-fracture 
and (d-f) maximum hydro-fracture opening (time points shown in Fig. 2a). opening. Error ellipses of 1 sigma are shown for the GPS displacements (blue 


Moulin location, last known lake shoreline, GPS stations, and NIF vertical crack __ ellipses). Basal sub-elements are 0.83 km by 0.83 km, resulting in 144 sub- 
surface trace derived from SAR imagery are shown as a yellow circle, blue line, elements over a 10 km X 10 km region. 
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Extended Data Figure 5 | The 2013 basal slip and cavity opening at hydro- _ black triangles, and black line, respectively. Vector fields show GPS (NIF) 
fracture initiation and maximum hydro-fracture opening. NIF-calculated displacement less background velocities in black (green) for (a) the period 


(a) extra basal slip accumulated, (b) basal cavity opening, and (c) hydro- between the start of the precursor and hydro-fracture initiation, and (d) the 
fracture crack opening at the time of the 2013 (a—c) hydro-fracture initiation period between hydro-fracture initiation and maximum hydro-fracture 
and (d-f) maximum hydro-fracture opening (time points shown in Fig. 2a). opening. Error ellipses of 1 sigma are shown for the GPS displacements (blue 


Moulin location, last known lake shoreline, GPS stations, and NIF vertical crack __ ellipses). Basal sub-elements are 0.83 km by 0.83 km, resulting in 144 sub- 
surface trace derived from SAR imagery are shown as a yellow circle, blue line, elements over a 10 km by 10 km region. 
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Extended Data Figure 6 | The 2011 station time series. a—c, Flowline, crack-__ sources (Extended Data Fig. 7) shown in red, and NIF station fits including L(t) 
normal, and uplift GPS displacements (in metres) (grey stars), respectively, for (random benchmark wobble term) are shown in black. Stations are ordered 
stations used in the 2011 NIF. NIF station fits from the three displacement roughly north to south on the y axis, offset by 0.5 m. 
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e NLo4 


c NLO8 


GPS station displacement 
—— NIF hydro-fracture opening 


——— NIF basal cavity opening 


NIF extra basal slip 
Sum of 3 NIF displacement sources 


Flowline displacement (m) 


-0.05 


Crack-normal displacement (m) 
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Days since 2011 rapid drainage 


Extended Data Figure 7 | The 2011 NL08 and NL04 Station flowline, crack- 
normal, and uplift displacements computed from NIF displacement 
sources. Flowline, crack-normal, and uplift GPS displacements less 
background velocity field (grey stars) are plotted the for (a-c) NL04 and 
(d-f) NLO8 over the 2 days before and 1 day after the 2011 rapid North Lake 
drainage. These stations are two examples chosen from the full array because 


Days since 2011 rapid drainage 


they capture displacement on both the northern (NL04) and southern (NL08) 
side of the lake, are located at roughly the same longitude as M1, and are 
within 2 km of the lake. NIF-calculated surface ice displacements at NL04 and 
NLO8 stations from the three displacement sources are plotted for the (red) 
hydro-fracture crack opening, (green) basal cavity opening, and (blue) extra 
basal slip. The sum of all three NIF displacement sources is shown in black. 
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Extended Data Figure 8 | MLE of NIF hyperparameters. MLE of the 
vertical hydro-fracture plane temporal smoothing parameter, «, for (a) 2011, 


Vertical Crack Temporal Smoothing Parameter: Alpha 


the —2 X likelihood plots’’. Minimum likelihood estimates are outlined in red 
circles, with the value used in each year’s inversion outlined indicated with a 


(b) 2012, and (c) 2013 NIF. The MLE corresponds with the minimum valueon _ black diamond (Methods). 
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a. Supraglacial Lake Formation b. A Precursor to Rapid Lake Drainage c. Hydro-fracture Opening and Rapid Drainage 
A supraglacial lake forms on the surface of the Greenland Substantial melt-water is routed to the bed via a moulin, A hydro-fracture opens through the lake basin, draining 
Ice Sheet when surface runoff fills a compressive basin. causing uplift and tension at the ice sheet surface. water in the lake to the bed within a few hours. 
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Extended Data Figure 9 | Stress changes across North Lake basin. Stress changes during (a) supraglacial lake formation, (b) rapid drainage precursor, and 
(c) hydro-fracture opening. 
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Environmental 2011 2012 2013 
Day of Year June 18, 2011 (DOY June 9, 2012 June 19, 2013 
169) (DOY 161) (DOY 170) 
Start of Precursor (decimal DOY 168.85 161.20 169.90 
GMT+0) 
Hydro-fracture Initiation 169.21 161.72 170.45 
Maximum Hydro-fracture 169.32 161.85 170.55 
Opening 
Drainage Duration* ~3 hours ~5 hours ~5 hours 
Lake Volume (DEM) (km°) 0.0077 + 0.001 approx. same as 2011 0.0036 + 0.001 
Small Lake: 0.0021 + 
0.001 
Lake Shoreline Location at Meets M1 approx. same as 2011 May fill channel trough 
Drainage to M1 
GPS 
Background velocity magnitude 162 m/year 125 m/year 94 m/year 
average across all stations 
(m/year) 
Background velocity direction 276° 277° 277° 
average across all stations (deg) 
Precursor type Uplift in lake basin, Speed up NLOI and Speed up of western 
followed by speed up in NLO2 stations; minor stations (FL03, NL04, 
lake basin uplift in basin NLO07, NLO8, NL10); 


uplift in basin 


Duration of Precursor before 


10 (uplift); 5 (speed up) 


16 (N speed up); 24 


16 (possibly as early as 


Drainage starts (hours) (minor uplift in basin) 24 hours before) 
Network Inversion Filter 
Vertical Crack Initiation M1 Center of lake basin Mi 


Location 


Vertical Crack Propagation 


Propagates from M1 to 


Stays in center of lake 


Stays at M1 (east 


History Lake Basin basin unresolved) 
Max. Vertical Crack Opening 0.16 0.36 0.40 
Width (m) 

Max. Vertical Crack Volume (km? 2.9x 104 73x 107 4.9x 107 
Max. Basal Cavity Opening Lake Basin Lake Basin Lake Basin 
Location 

Max. Basal Cavity Volume 0.0095 0.0067 0.012 


(km?) 


Max. Extra Basal Slip Locations 


Lake Basin, SW of 
Lake Basin 


Lake Basin, SW & NW 
of Lake Basin 


Lake Basin and 
Western Stations 


Average extra basal slip across 
basal plane just after drainage 


(m) 


0.13 (DOY 169.5) 


0.15 (DOY 162.0) 


0.31 (DOY 170.7) 


Mb (basal moment) (N*m) just 4.6 x 10% 5.3.x 10% 11x10” 
after drainage 
Mw (moment magnitude) just 5.1 5.1 5.3 


after drainage 


Time of start of precursor, start of hydro-fracture crack opening, and maximum hydro-fracture crack opening equivalent to time delineations are shown in 
*Drainage duration calculated as duration of southward anomaly in the NLO8 crack-normal time series (See Methods: GPS data). 
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Extended Data Table 1 | The 2011, 2012, and 2013 North Lake drainage environmental, GPS, and NIF observations 
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Cardiac lymphatics are heterogeneous 
in origin and respond to injury 


Linda Klotz'*, Sophie Norman*, Joaquim Miguel Vieira’, Megan Masters’, Mala Rohling’, Karina N. Dubé!, Sveva Bollini?, 


Fumio Matsuzaki*, Carolyn A. Carr? & Paul R. Riley” 


The lymphatic vasculature is a blind-ended network crucial for tissue-fluid homeostasis, immune surveillance and lipid 
absorption from the gut. Recent evidence has proposed an entirely venous-derived mammalian lymphatic system. By 
contrast, here we show that cardiac lymphatic vessels in mice have a heterogeneous cellular origin, whereby formation 
of at least part of the cardiac lymphatic network is independent of sprouting from veins. Multiple Cre-lox-based lineage 
tracing revealed a potential contribution from the putative haemogenic endothelium during development, and discrete 
lymphatic endothelial progenitor populations were confirmed by conditional knockout of Prox1 in Tie2* and Vav1* 
compartments. In the adult heart, myocardial infarction promoted a significant lymphangiogenic response, which was 
augmented by treatment with VEGF-C, resulting in improved cardiac function. These data prompt the re-evaluation ofa 
century-long debate on the origin of lymphatic vessels and suggest that lymphangiogenesis may represent a therapeutic 


target to promote cardiac repair following injury. 


In 1902, Florence Sabin proposed that the primary lymph sacs ori- 
ginate from the embryonic veins and then give rise to the entire 
lymphatic vasculature by sprouting and remodelling’. An alternative 
model of lymphatic development was proposed by Huntington and 
McClure in 1910, who suggested that lymph sacs arise in the mesench- 
yme, independently of veins, via distinct progenitor cells’. More 
recent evidence has supported Sabin’s model, such that trans- differ- 
entiation of venous into lymphatic endothelial cells (LECs) is now 
widely accepted, with the veins regarded as the sole origin of the entire 
lymphatic vasculature in mammals*’. To date, studies which support 
a venous origin have focused exclusively on the development of the 
systemic lymphatic vasculature. Organ-based lymphatics have 
received little attention and in the heart, while the presence of cardiac 
lymphatic vessels has been described’, virtually nothing is known 
about their role during development or in the healthy or failing adult 
heart. We therefore sought to characterize the formation of the car- 
diac lymphatic vessels through developmental stages, to identify their 
embryonic origin and effect during organogenesis and to assess their 
response to pathological insult in the adult setting. 


Development of the cardiac lymphatic vasculature 


Whole-mount staining of murine hearts for early LEC markers 
VEGFR-3 (ref. 9) and Proxl (ref. 10), revealed the emergence of 
lymphatic vessels at embryonic day 12.5 (E12.5), sprouting from 
extra-cardiac regions proximal to the outflow tract, on the ventral 
side (Fig. la, increased magnification in Fig. 1b). At E14.5, lymphatic 
vessels were observed on the ventricular surface sprouting from the 
region of the sinus venosus, on the dorsal side (Fig. 1c, increased 
magnification in Fig. 1d and Extended Data Fig. la, increased mag- 
nification in Extended Data Fig. 1b). At E16.5 the major dorsal vessels 
spread inferiorly from the inflow region (Fig. le, increased magnifica- 
tion in Fig. 1f), while ventrally smaller vessels arose between the atria 
(Extended Data Fig. 1c, d). By E18.5, the vessels continued to expand 
and projected towards the apex of the heart on both dorsal and ventral 


surfaces (Fig. 1g, h and Extended Data Fig. le, f). From birth (post- 
natal day 0 (PO)), the vessels developed a more extensive branched 
network and expanded further over the ventral side of the 
neonatal heart (Fig. li, j). By P10, the cardiac lymphatics provided 
superficial coverage of the majority of the epicardial surface of 
the heart (Extended Data Fig. 1g, h) and appeared fully developed 
by P15 (Extended Data Fig. li, j). The lymphatic identity of the 
VEGFR-3- and Proxl-labelled cardiac vessels (Fig. la-j and 
Extended Data Fig. la—n) was further validated by co-immunostain- 
ing for the lymphatic vessel endothelial hyaluronan receptor 1 (Lyve- 
1), which also labels tissue macrophages'’. Coronary LECs within the 
expanding plexus on both dorsal and ventral sides of the developing 
heart co-expressed VEGFR-3, Prox] and Lyve-1 (Extended Data Fig. 
lo-v). Cardiac lymphatic vessels aligned with the endomucin 
(Emcn)-positive coronary veins during late gestation (E15.5-18.5) 
(Fig. 1k-m) and established extensive inter-vessel connections ana- 
logous to blood vessel anastomosis (Fig. ln—-p). At birth (PO) lateral 
Lyve-1* sprouts beneath smooth-muscle-actin-positive coronary 
veins (Fig. 1q-s) were indicative of a close anatomical relationship 
between the coronary veins and developing lymphatic vasculature 
(Fig. 1t). 


Avenous and non-venous contribution of LECs 


Prox1* LECs did not appear to emerge or bud-off from Emcn-expres- 
sing coronary vessels between E12.5-14.5 (Extended Data Fig. 2a-i). 
Instead, extra-cardiac LECs migrated into the sinus venosus on the 
dorsal side, and outflow tract on the ventral side of the heart by E12.5 
(Extended Data Fig. 2a, b, also Fig. 1a, b) and expanded to form a 
network proximal to Emcn™ veins from E13.5 and E14.5 (Extended 
Data Fig. 2d-i) through to E17.5 (Extended Data Fig. 2j-0). Whole 
embryo staining at E10.5 and E12.5 (Extended Data Fig. 3a-f) 
revealed a Proxl/VEGFR-3-expressing LEC population emerging 
from the Emcn* common cardinal vein and migrating towards 
the neighbouring sinus venosus and outflow tract (Extended Data 
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| VEGFR-3| 


Figure 1 | Spatiotemporal development of the murine cardiac lymphatic 
vasculature. a-h, Whole-mount confocal imaging of embryonic hearts stained 
with VEGFR-3 and Prox] at E12.5 (a; white box enlarged in b), E14.5 (c; white 
box enlarged in d), E16.5 (e; enlarged box in f) and E18.5 (g; enlarged box in 
h). i, j, From birth (PO), lymphatic vessels branch and expand further onto the 
dorsal epicardial surface of the heart (i; enlarged box in j). Schematics below the 
images represent the stages of lymphatic vessel development (n = 5 hearts 
analysed per time point). k, 1, Whole-mount staining with Emcn (veins) and 
Lyve-1 (lymphatics). m, 3,3’-diaminobenzidine (DAB) staining with VEGFR- 
3. n-p, Enlarged images of box in k stained for Emcn (n), Lyve-1 (0) or both 
(p). White arrowhead in o highlights a coronary vein. q-s, «&-Smooth muscle 
actin (SMA)- (veins, q) and Lyve-1-stained (lymphatics, r) hearts (white 
arrowhead in r indicates location of blood vessels) at later stages (PO). s, Merge 
of SMA and Lyve-1 staining. t, Schematic representation of the dorsal side the 
heart at P10 (shown in Extended Data Fig. 1g, h; n = 5 hearts analysed per time 
point). CA, coronary artery; CV, coronary vein. Scale bars: a, c, e, 750 um; b, 300 
Lum; g, 1 mm; i, 2 mm; k-m, 200 um; p, 10 Lm; s, 5 pum. 


Fig. 3d-f), suggesting that cardinal-vein-derived endothelial cells may 
be the venous source of coronary lymphatic vessels, an observation 
supported by previous studies”’. 

To investigate the lymphatic cellular origin further, we first per- 
formed lineage-tracing experiments using a Tie2-Cre line’* with a 
R26R-eYFP reporter’, revealing labelling of the embryonic cardinal 
vein at E10.5 (Extended Data Fig. 4a-d). At E12.5, Emcn* jugular 
(cardinal) veins and lymph sacs, contributors to the systemic lymph- 
atic vascular network®, were both YFP* and Lyve-1" (Fig. 2a-f). In 
contrast, E14.5 hearts revealed lymphatic vessels proximal to the out- 
flow tract region which were YFP (Fig. 2g, h), despite complete Tie2- 
eYFP recombination and labelling of lymphatics elsewhere in the 
embryo (Extended Data Fig. 4a-c). The relative incidence of YFP* 
versus YFP~ lymphatic vessels in the heart was 78 + 5.5% YFP~ 
versus 19 + 3.3% YFP cells (mean percentage of cells + s.e.m. per 
field of view; n = 24 fields of view; six fields of view per heart, four 
hearts in total) and confirmed by orthogonal z-stack reconstruction 
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Figure 2 | Incomplete contribution of Tie2* venous-derived LECs indicates 
a novel non-venous contribution to the developing cardiac lymphatics. 

a, b, Tie2-Cre;R26R-eYFP embryos at E12.5 (a) stained with anti-GFP to detect 
the eYFP reporter expression, -Lyve-1 and -Emcn antibodies (b). c-f, Enlarged 
images of the area marked by the white box in b. Jugular lymph sacs (JLS) were 
YFP". g-i, Whole-mount staining with VEGFR-3, Lyve-1, or Prox1 revealed 
incomplete recombination in cardiac lymphatic vessels (g) with evident YFP — 
regions of vasculature containing Prox1* nuclei (h); see confocal z-stack 
reconstructions numbered 1-5 (i, as indicated by white inset boxes in g and 
h). j-m, At E17.5, both Tie2-YFP* vessels (j, k) and Tie2-YFP vessels 

(1, m; highlighted by white arrowheads) were observed; n = 5 hearts analysed 
per time point. JLS, jugular lymph sac; JV, jugular vein. Scale bars: a, 200 jim; 
b-h, j-m, 100 pm. 


(Fig. 2i), which revealed both YEP* (Fig. 2j, k) and YFP vessels 
(Fig. 21, m) in the developing heart at E17.5. 

To confirm a non-venous contribution to cardiac lymphatic ves- 
sels, we analysed tamoxifen-inducible PDGFB-CreER™” mice, crossed 
with either R26R-tdTomato™ or R246R-mTmG’” reporter lines to spe- 
cifically label endothelial cells lining blood vessels (Extended Data Fig. 
4e-o). Incomplete recombination of tdTomato within Lyve-1* 
lymphatic vessels (Extended Data Fig. 4f) was evident with both 
tdTomato*/Lyve-1" (Extended Data Fig. 4g-i) as well as 
tdTomato /Lyve-1* lymphatic vessels (Extended Data Fig. 4j-1), 
indicating a mixed contribution of endothelial- and non-endothe- 
lial-derived cardiac lymphatics. This was supported by crosses with 
an mTImG reporter mouse, where the level of GFP recombination 
within cardiac lymphatic vessels was mosaic (Extended Data Fig. 
4m-o). 


A putative haemogenic source of cardiac LECs 

We next examined the possibility that an alternate source of LECs 
might arise from one of three potential cardiac progenitor popula- 
tions’*: the epicardium, cardiac mesoderm (early and late stage) or 
cardiac neural crest by lineage tracing with Wtl-CreERT2”, Mesp1- 
Cre'’, Nkx2.5-Cre’? and Wnt1-Cre”® lines crossed with the R26R- 
eYFP reporter, respectively. There was no contribution of Wtl* 
(YFP*) cells to the developing coronary lymphatics, excluding the 
pro-epicardial organ as a source of LECs (Extended Data Fig. 5a-c) 
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Figure 3 | Vavl*, Pdgfrp* and Csfir* lineages contribute LECs to the 
developing cardiac lymphatics. a, Vavl-Cre;R26R-tdTomato lineage tracing 
revealed regions of t{Tomato* recombination throughout the heart at E17.5 
(n = 4 hearts analysed). Co-labelled tdTomato* /Prox1*/Lyve-1* were 
confirmed by z-stack reconstructions (below a, 1-4; 5 and 6 lack Prox] and are 
tdTomato™ /Lyve-1* macrophages). b-d, Representative enlarged views of 
alternative Vavl-Cre;R26R-tdTomato hearts. Left, tdTomato staining; middle, 
Prox]! staining; right, merged images of tdTomato, Prox1 and Lyve-1 staining. 
White arrowheads in b indicate cardiac lymphatic vessels. White arrows in 

c indicate macrophages proximal to the Prox1* LECs that were evident at 
higher magnification. e-h, PdgfrB-Cre;R26R-mTmG E17.5 hearts including 
z-stack reconstruction (e, stacks 1-4 are GEP*/Prox1*; 5 and 6 are singly 
Prox1*; n = 3 hearts analysed) revealed a heterogeneous contribution of 
recombined GFP” cells that were Prox1* (f, g) or VEGER-3* 

(h). fh, Enlarged views of alternative PdgfrB-Cre;R26R-mTmG hearts. 

i-l, Analyses of E17.5 hearts from Csflr-CreER;R26R-tdTomato embryos 
injected with 4-hydroxytamoxifen at E7.5 (i, n = 5 hearts analysed) revealed 
contribution of recombined tdTomato™ cells that were Proxl* (z-stack 
projections 1-4 in i, and white arrowheads in j—1 with accompanying z-stacks). 
j-l, Enlarged views of alternative Csflr-CreER;R26R-tdTomato hearts. Scale 
bars: a, e, i, 60 um; b-d, 30 kum; f, h, j, k, 1, 100 jum; g, 200 pum. 


and neither Mesp1* or Nkx2.5-labelled lateral-plate-mesoderm- 
derived progenitors (Extended Data Fig. 5d-i) nor Wntl* cardiac 
neural crest cells (Extended Data Fig. 5j-1) contributed to the devel- 
oping coronary lymphatics. Subsequently, we sought to determine 
whether there might be a distinct Tie2~ endothelial source of LEC 
progenitors. The haemogenic endothelium represents the site of 
primitive haematopoiesis in the visceral yolk sac and developing 
blood islands of the early embryo and while Tie2-Cre does label a 
significant proportion of cells within the yolk sac, there are aggrega- 
tions and primitive haematopoietic derivatives which are Tie2-nega- 
tive’’. To potentially capture this Tie2~ population, we employed 
three Cre-driver lines under the control of Vav1?*™, Pdgfrb** and 
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Csfir’®, in combination with either the R26R-tdTomato or R26R- 
mTmG reporters. Initially, we excluded reporter labelling of the endo- 
thelium of the common cardinal vein by these three drivers at E10.0 
(Extended Data Fig. 6a—d and data not shown) and jugular vein at 
E12.5 (Extended Data Fig. 6e-h), however, we cannot exclude the 
possibility of tracing a subset of venous-derived cells fated to form 
LECs before any evidence of Prox] expression. Extensive labelling of 
Vav1-tdTomato” cells was evident in regions of the developing heart 
at E17.5 (Fig. 3a), including tdTomato ‘/Lyve-1” tissue macrophages 
(Fig. 3a), which were negative for Prox] and located proximally to the 
developing vessels (Fig. 3b, c). In contrast to the situation in the heart, 
Vavl-tdTomato* cells were not observed in the dermal lymphatics 
from dorsal skin preparations analysed at E17.5 (n = 4 embryos; 
Extended Data Fig. 5m-r). Subsequently, we confirmed the presence 
of tdTomato”* cells within lymphatic vessels which were Proxl* and 
Lyve-1* (Fig. 3a, b, d; 14 + 5.3% tdTomato*/Prox1* (mean percent- 
age of cells + s.e.m. per field of view; n = 20 fields of view; five fields of 
view per heart, four hearts in total) as confirmed by z-stack recon- 
struction (Fig. 3a). In PdgfrB-Cre;R26R-mTmG reporter mice, GFP* 
cells were observed in the coronary lymphatics at E17.5 (Fig. 3e), 
which were positive for Prox] (Fig. 3f, g) and VEGFR-3 (Fig. 3h; 28 
+ 4.7% GFP‘ /Prox1*; mean percentage of cells + s.e.m. per field of 
view; n = 18 fields of view; six fields of view per heart, three hearts in 
total) (Fig. 3e with z-stack) and in hearts derived from Csflr- 
CreER;R26R-tdTomato embryos, a contribution of tdTomato ~ cells 
which co-labelled with Proxl (Fig. 3i with z-stack) and Lyve-1 
(Fig. 3j-1] with z-stacks), further suggested a yolk-sac progenitor con- 
tribution (Fig. 3i-l). The relative incidence of tdTomato*/Prox1~ 
cells was low (less than 5%) likely reflecting inefficient labelling by 
the inducible Csflr-CreER. 

In order to investigate a yolk-sac contribution to LECs further, we 
derived ex vivo cultures of explanted Vav1-Cre;R26R-tdTomato con- 
ceptuses at E8.0°”. Intact yolk sac explants were treated with 100 ng 
ml! of recombinant VEGF-C(C156S)", a potent selective lymphan- 
giogenic cue that only signals via VEGFR-3 (the C156S mutation 
prevents binding to VEGFR-2). A tdTomato* outgrowth from the 
yolk sac was observed under VEGF-C induction (Extended Data Fig. 
6i, j) with specification of Prox1 * LECs in culture (Extended Data Fig. 
6k-z). Since this stage of development was too early to detect a venous 
origin or alternate embryonic source, we conclude that these LECs 
were yolk-sac-derived. 


Prox] loss of function supports dual LEC origin 


To provide further evidence for both a venous-endothelium and inde- 
pendent source of cardiac LECs, we genetically deleted Prox1 inde- 
pendently in both the Tie2* blood endothelial and Vavl~* 
compartments. We first used Prox! conditional mice” crossed with 
the Tie2-Cre mice (Extended Data Fig. 7a). Fluorescence-activated 
cell sorting of targeted GFP* cells from isolated Tie2-Cre;Proxt" 
hearts (Extended Data Fig. 7b) revealed appropriate knock-down of 
Prox1 (Extended Data Fig. 7c; 0.59-fold; n = 5 mutant hearts ana- 
lysed; P = 0.05), accompanied by knock-down of Vegfr3 (also known 
as Fit4; Extended Data Fig. 7d; 0.39-fold; n = 5 mutant hearts ana- 
lysed; P = 0.01) and Lyvel (Extended Data Fig. 7e; 0.22-fold; n = 5 
mutant hearts analysed; P = 0.001). Tie2-Cre;Prox 4 “f! mutant 
embryos had gross vascular anomalies, including ectopic surface 
blood vessels, a disrupted vascular network and apparent haem- 
orrhaging (Extended Data Fig. 7f-i). An initial failure in specification 
of cardiac LECs was confirmed at E14.5, coincident with the first 
emergence of the lymphatics on the dorsal surface of the heart 
(Fig. 1c). GFP" -targeted and Lyve-1* LECs were observed at the base 
proximal to the atrioventricular region of the heart in Tie2- 
Cre;Proxl”* controls but were absent in the mutant hearts 
(Extended Data Fig. 8a-f). There was no apparent effect on the cor- 
onary vasculature, as determined by comparable whole-mount CD31 
staining (Extended Data Fig. 8g, h). At E17.5 Tie2-Cre;Prox!”" hearts 
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were recovered largely devoid of VEGFR-3* LECs (Extended Data 
Fig. 9a-d) and were relatively dysmorphic along the apical—basal 
(long) axis (Extended Data Fig. 9c, d), with smaller chambers and 
thickening of the ventricular compact layer (Extended Data Fig. 8, j). 
Despite these anomalies, endocardial cushions formation appeared 
unaffected (Extended Data Fig. 8i, j). Relative to Tie2-Cre;Prox”* 
heterozygotes (Extended Data Fig. 9e-h), GFP*/Lyve-1* lymphatic 
vessels were either partially or completely absent from the dorsal 
surface and completely absent from the ventral surface of mutant 
hearts (Extended Data Fig. 9i-p). The partial and complete loss of 
LECs correlated with the loss of Prox] protein expression (Extended 
Data Fig. 9k, 0). Tie2-Cre;Prox“" mutant hearts were also recovered 
at E17.5 with significant coverage of targeted GFP * /Lyve-1* lympha- 
tics which correlated with incomplete knockdown of Prox1 (Extended 
Data Fig. 7c). The resultant phenotype was mild hypoplasia of the 
lymphatic vessels and a partially truncated vascular network 
(Extended Data Fig. 10a-f). Vessels were significantly shorter and 
thinner with increased truncations relative to controls (Extended 
Data Fig. 10m-o). Immunostaining for cleaved caspase-3 revealed 
an increase in apoptotic cells within the termini of mutant vessels 
(Extended Data Fig. 8k, 1) supporting the requirement for Prox1 in 
LEC aed and maintenance. Nevertheless, hypomorphic Tie2- 
Cre;Prox¥" mutants were recoverable at postnatal stages, whereby 
hypoplasia of the cardiac lymphatics appeared to be rescued beyond 
birth (Extended Data Fig. 8m-p). 

We next targeted Prox! within the Vavl” lineage. Specification of 
lymphatic vessels in severely affected Tie2-Cre;Prox/”" hearts at the 
base of the heart on the dorsal surface (Extended Data Fig. 9c, 1), 
corresponded to the potential contribution of Vav1* to the cardiac 
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lymphatics (Fig. 3). In Vavl-Cre;Prox 1 /f! mutant hearts at E14.5, 
emerging VEGFR-3" cardiac lymphatics were evident on both ventral 
and dorsal surfaces (Extended Data Fig. 8q-x). At E17.5, control 
Vavl-Cre;Prox¥”* mice revealed appropriate targeting of GEP* 
LECs and an extensive lymphatic network on the ventral surface as 
indicated by Lyve-1, with retained Prox] expression (Extended Data 
Fig. 9q-s). Vav1-Cre;Prox/“" mutants revealed no obvious systemic 
vessel defects (Extended Data Fig. 7j-m). Co-expression of GFP*/ 
Lyve-1* was observed in LECs (Extended Data Fig. 9t-x), however, 
specific loss of Lyve-1* LECs was detected at subcellular resolution 
that directly correlated with loss of Proxl and GFP-targeting 
(Extended Data Fig. 9y, z), supporting a Proxl-dependent Vavl" 
source of cardiac lymphatics. 


Neo-lymphangiogenesis post-cardiac injury 

Lymphangiogenesis in other settings (most notably during skin infec- 
tion) has been implicated in antigen clearance and inflammatory 
resolution***’. Thus, we determined whether the cardiac lymphatics 
might attempt compensatory angiogenesis during the pro- 
inflammatory phase following myocardial infarction (MI). We first 
analysed VEGFR-3 protein levels as a surrogate for an early lymphatic 
response, and observed a significant increase in VEGFR-3 at all stages 
from 24 h up to 21 days post-MI (Fig. 4a). Alterations in VEGFR-3 
protein levels were recapitulated at the gene-expression level (Fig. 4b) 
and a general activation of the developmental lymphatic gene pro- 
gram was confirmed by concomitant increased expression of Lyvel 
and Prox] (Fig. 4c, d). At day 7 following injury there was a significant 
increase in the branching of surface VEGFR-3* lymphatic vessels 
(Fig. 4e, f), and alignment of Proxl* lymphatic sprouting with 
Emcn” veins (Fig. 4g). Longitudinal analyses, from days 7 to 35 
post-MI, revealed marked spatiotemporal changes in the lymphatic 
response. In the intact heart there were few superficial lymphatic 


Figure 4 | Myocardial infarction induces a significant cardiac 
lymphangiogenic response that can be enhanced by VEGF-C-stimulation 
to promote functional improvement. a, VEGFR-3 protein levels increased 
from 24h to 21 days post-MI, peaking at day 4 (D4) (n = 3 animals analysed per 
time point; single representative western blot with densitometry). b-d, Real- 
time analysis of Vegfr3 (b), Lyvel (c) and Prox1 (d) mRNA all revealed a 
significant increase in expression levels across the equivalent time-points post- 
MI (n = 3 animals per time point). e, f, VEGFR-3 whole-mount staining 
revealed increased lymphangiogenesis in the left ventricle (LV), proximal to the 
infarct 7 days post-MI (e, f; black arrowheads indicate areas in left ventricle with 
increased lymphangiogenesis, white arrowheads indicate areas with reduced 
lymphatic vessel density; n = 3 mice per group). g, Sprouting of Proxl* 
lymphatics was observed aligning with Emcn* veins 7 days post-MI (white 
asterisk, ligating suture (see Methods)). h-q, Short-axis sections at day 7 post- 
MI revealed Lyve-1*/Pdpn™ lymphangiogenesis in the scar region (white 
boxes in h, j, 1, n, p), which was significantly increased relative to the intact 
heart (h, i) and which expanded through days 14 (m), 21 (0) and 35 with large 
lymphatic ‘shunts’ evident in the left ventricle (q; n = 5 hearts analysed). 

i, k, m, 0, q, Enlarged images of the boxed areas in h, j, |, n and p, respectively. 
RV, right ventricle. r-u, Whole-mount X-gal staining of Vegfr3*/" hearts 
after administration of VEGF-C reveals the lymphangiogenic response post-MI 
(n = 3 per treatment group). r, s, t, Mice treated with recombinant human 
VEGF-C(C156S) exhibited extensive lymphangiogenesis in the injury area 

(t, black arrowheads and white inset box enlarged in the top image of panel 
u) compared with vehicle-treated (s, black arrowheads and white inset box 
enlarged in the bottom image of panel u) or sham-operated mice 

(r). v, w, Whole-mount DAB staining of MI hearts with or without VEGF-C 
administration with VEGFR-3 (v) or Prox-1 (w) confirmed the observations in 
r-u (white asterisks indicate the ligating suture (see Methods)). 

x, y, Longitudinal MRI analyses of infarcted hearts 21 days after surgery 
following treatment with either vehicle (x) or VEGF-C (y). z, Ejection fraction 
measurements revealed a significant improvement in VEGF-C-treated hearts, 
compared to vehicle, at 14 and 21 days post-MI. n = 8 wild-type mice per 
treatment group. All graphs show mean + s.e.m. Data analysed with Student’s 
t-test; *P = 0.05. Scale bars: e, f 1mm; g 400m; h, j, 1, n, p, 200,1m; i, k, m, 0, q 
400m; r-t 1mm; u-w 500m; x, y 2mm. 
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Table 1 | Functional parameters from longitudinal MRI of VEGF-C- and vehicle-treated hearts post-Ml 


7 days post-Ml 


14 days post-Ml 


21 days post-Ml 28 days post-Ml 


VEGF-C (n = 8) PBS(n=8)  VEGF-C(n = 8) 
Body weight (g) 19.0 + 0.6 19.8+0.5 19.6 + 0.7 
Heart rate (b.p.m.) 515 = 10 499 +19 51146 
Left ventricle 

End diastolic volume (1l) 496+44 68.9 +11.1 48.3 + 4.0 
End systolic volume (ul) 23.1448 40.3 + 9.7 215+ 41* 
Stroke volume (1!) 26.5 + 2.3 28.6 + 2.3 26.8 + 1.5 
Ejection fraction (%) 56+6 46+5 57 + 4* 
Cardiac output (ml min~?) 135209 143414 13.7 + 0:6 
Left ventricular mass (mg) 80+5 96+8 80+4 
Absolute infarct size (mm?) 13.6 25.6 364+114 142+58 
Relative infarct size (%) 10+4 212-5 11+4 


PBS(n=8)  VEGF-C(n=8)  PBS(n=8)  VEGF-C(n=8) PBS(n=8) 
20.5 + 0.5 19.8+0.5 20.4 + 0.4 20.9 + 0.7 21.3 + 0.6 
496 +19 510+9 494+ 10 469 + 52 496 + 26 
70.9 + 10.9 56.6 + 5.4 724 11:5 60.9 + 7.4 73.3 + 13.7 
43.5 + 10.0* 240+53 42.1 + 10.2 279+76 42.3+12.1 
273415 32.6+1.6 30.0 + 2.7 32.9 + 2.0 310:+:3:2 
43 + 5* 60 + 5* 45 + 5* 56+6 45+6 
13.6 + 1.0 16.6 +08 14.7 + 1.2 15.4 + 2.2 15.5.4+:2.1 
90+7 88+6 90+4 94+9 95+6 
35.0 + 10.8 14.1 +53 34.5 + 11.0 13/5267 28.9 + 12.0 
20+5 10+3 21+6 9+4 18+7 


Data presented as mean + standard error of the mean. Asterisks indicate significant differences between VEGF-C- and PBS-treated hearts; P = 0.05; repeated measures t-test, two-tailed distribution, two-sample 


equal variance. 


vessels detectable (Fig. 4h, i), as evident from staining for Lyve-1 and 
podoplanin (Pdpn (ref. 33)) as compared to day 7 after injury when 
there was a significant increase in the number of Lyve-1*/Pdpn* 
lymphatic vessels in cross-section, (Fig. 4j, k). These vessels increased 
in diameter by day 14, concurrent with nascent lymphatic network 
expansion (Fig. 41, m), and persisted through day 21 (Fig. 4n, 0) to day 
35, where enlarged lymphatic shunts were evident, localized super- 
ficially in the myocardium at the border zone of the infarct/scar region 
(Fig. 4p, q). Thus, the adult cardiac lymphatics undergo significant 
angiogenesis following initiation of a developmental programme in 
response to ischaemic injury. 


VEGF-C improves cardiac function post-MI 


To investigate the influence of neo-lymphangiogenesis on cardiac 
function after MI, we treated wild-type or Vegfr3’"’* reporter mice™ 
with recombinant VEGF-C(C156S)”* at days 0, 2, 3, 4 and 6 post-MI. 
At day 7 post-MI, a stronger lymphangiogenic response (X-gal */ 
VEGFR-3"/Prox1*) was observed surrounding the injury area in 
VEGF-C-treated samples, compared to vehicle-treated controls 
(Fig. 4r-w). Moreover, VEGF-C-treated mice exhibited a significant 
improvement in cardiac function as determined by longitudinal MRI 
(Fig. 4x-z and Table 1). Specifically, smaller ventricular end-systolic 
volumes (Fig. 4x, y and Table 1) and significant improvement in the 
ejection fraction were recorded in the VEGF-C-treated group (Fig. 4z 
and Table 1; 43 + 5% for vehicle versus 57 + 4% for VEGF-C-treated 
by 14 days post-MI; 45 + 5% for vehicle versus 60 + 5% for VEGF-C- 
treated by 21 days post-MI; mean + s.e.m.;n = 8 animals per group; P 
= 0.05). The improvement in cardiac function was maintained for at 
least 28 days post-MI (Table 1). Collectively, these data suggest that 
promotion of growth-factor-induced lymphangiogenesis is possible 
in the adult diseased heart and improves prognosis, analogous to what 
has been reported in other disease models”. 


Discussion 


Our study challenges the unequivocal view of lymphatic vessel 
development derived from Sabin’s model of venous origin’. We reveal 
that the lymphatic vasculature of the embryonic mouse heart com- 
prises a heterogeneous make-up of cell populations, with contribu- 
tions derived from both extra-cardiac venous endothelium and a 
novel source of lymphatic progenitors which may arise from the yolk 
sac haemogenic endothelium. Targeting of Prox1 in both venous 
endothelial and non-venous-derived compartments resulted in loss 
of the cardiac LECs, supporting a dual origin in the developing heart 
and consistent with previous studies demonstrating that Prox1 is both 
necessary and sufficient to drive LEC fate specification*®*. Prox] acts 
at the decision point between blood and lymphatic endothelial cell 
specification*®®, such that Prox1-deficient LECs contributing to the 
systemic blood vasculature resulted in ectopic vessels and haem- 
orrhaging throughout the embryo. However, in Tie2-Proxl] mutant 
hearts, hypoplasia of the lymphatic vessels did not appear to impact 
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upon the gross development of the coronary blood vessels, highlight- 
ing a further unique ontology of the cardiac lymphatics relative to 
systemic lymphatic vasculature. Previously, Prox1 dosage effects 
underpinned formation of the systemic lymphovenous valves”; here 
partial Prox] knockdown resulted in formation of the cardiac lym- 
phatics but with truncation of the developing plexus and aberrant 
remodelling suggesting a novel role for Proxl in maintaining the 
cardiac lymphatic network. 

Insight into the embryological origin and development of the car- 
diac lymphatics has important implications for understanding cardio- 
vascular tissue fluid homeostasis, injury-induced inflammation and 
disease. Following MI the cardiac lymphatics underwent a profound 
angiogenic response, accompanied by an upregulation in the lymph- 
atic development gene program. Significantly, this was enhanced by 
ectopic VEGF-C stimulation following injury, leading to improve- 
ment in cardiac function. Myocardial injury is associated with a 
robust immune reaction, characterized by sequential mobilization 
of monocytes involved in inflammatory functions and wound heal- 
ing’. Lymphangiogenesis in inflammatory settings facilitates the 
resolution of tissue oedema and promotes macrophage mobiliza- 
tion*®*’, and induction by VEGF-C alleviates inflammation in mouse 
models***’. Therefore, mechanisms coupling lymphatic development 
to immune regulation represent a therapeutic target. Induction of 
lymphatic vessels could provide a pathway for inflammatory cell 
efflux to tip the balance in favour of wound healing within the injured 
adult heart. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Mouse strains. The following mouse strains were used as previously described: 
Csflr-CreER*®, Mesp1-Cre, Nkx2.5-Cre’’, PdgfB-CreERT2“, Pdgfrp-Cre**, 
Proxt”* (ref. 29), R26R-eYFP’, R26R-mTmG'3, R26R-tdTomato"™, Tie2-Cre”, 
Vavl-Cre*”, Wtl-CreERT2!”, Wnt1-Cre”, Vegfr3'e/ * (ref. 34). Breeding was car- 
ried out using only Cre* males for all Cre strains except the Vav1-Cre where Cre* 
females were used. Pregnant females crossed to inducible Cre male studs were 
injected intraperitoneally at E7.5 (Csflr-CreER) or E9.5 (Wtl-CreERT2; Pdgfb- 
CreERT2) with 2 mg of 4-hydroxytamoxifen (4-OHT) dissolved in peanut oil. 
Embryonic staging was determined by the day of the vaginal plug (E0.5). C57BL/ 
6 mice were used for the longitudinal cardiac cine-MRI study. Investigators were 
blinded to genotype and treatment groups. All animal experiments were carried out 
according to UK Home Office project license PPL 30/2987 Compliant with the UK 
Animals (Scientific Procedures) Act 1986. 

Quantitative real-time PCR. Total RNA was isolated from hearts using the 
Qiagen RNeasy Mini Kit (Qiagen). Complementary DNA was synthesized using 
the Reverse Transcription System (Promega), following the manufacturer’s 
instructions and used for quantitiative real-time PCR using SYBR Green on an 
ABI 7900 for the following genes: Vegfr3, Prox1, Lyvel. Fold change was deter- 
mined by applying the 2~44¢ method. The following primer sequences were 
used: Vegfr3, 5'-CCATCGAGAGTCTGGACAGC-3’ forward, 5’-CCGGGAT 
GGTGGTCACATAG-3’ reverse; Prox1, 5’-GAAGGGCTATCACCCAATCA 
-3' forward, 5’-TGAACCACTTGATGAGCTGC-3’ reverse; Lyvel, 5’-GGC 
TTTGAGACTTGCAGCTATG-3’ forward, 5'-GCAGGAGTTAACCCAGGT 
GT-3’ reverse. 

Western blotting. Heart samples were lysed in RIPA buffer (50 mM Tris-HCl at 
pH 7.6, 150 nM NaCl, 1%NP-40, 0.5% DOC, 0.1% SDS) supplemented with 
protease inhibitors (Protease Inhibitor Cocktail Tablet (Roche), 1 mM PMSF 
(Sigma) and 1 pg ml’ aprotinin (Sigma)). The lysate was centrifuged at 13,000g 
for 15 min at 4°C and the supernatant recovered. For SDS-PAGE samples were 
incubated with an equal volume of 2 X Laemmli Buffer/5% B-ME at 95°C for 5 min 
before being resolved on a 10/15% acrylamide gel (Sigma) and analysed by western 
blot using a primary antibody for VEGFR-3 (goat anti-mouse, 1:1,000 dilution) or 
GAPDH (goat anti-mouse, 1:1,000 dilution, Millipore). All secondary antibodies 
were conjugated to HRP and imaged using enhanced chemiluminescence (all GE 
Healthcare). 

Immunohistochemistry and histology, confocal imaging and quantitation. 
Hearts and embryos for histology were collected, fixed in 2% PFA overnight 
and either stored in PBS, or embedded in either paraffin wax or OPT (both 
Raymond Lamb). Ten-micrometre paraffin sections were stained with haema- 
toxylin (Sigma) and eosin (Raymond Lamb). Immunofluorescent staining on 
8-1m frozen sections was performed using primary antibodies to endomucin 
(catalogue number SC-53941, Santa Cruz Biotechnology, 1:50 dilution), CD31 
(553370, BD Pharmingen, 1:50), Proxl (11-002, AngioBio, 1:200), Proxl 
(AF2727, R&D Systems, 1:200), Lyve-1 (NBP1-43411, Novus Biologicals, 
1:500), Lyve-1 (11-034, AngioBio, 1:200), VEGFR-3 (AF743, R&D Systems, 
1:50), GFP (ab13970, Abcam, 1:2,000), Podoplanin (10R-P133a, Fitzgerald, 
1:500), o-smooth muscle actin (C6198, Sigma, 1:200), cleaved caspase-3 (9661/ 
9664, Cell Signalling Technology, 1:100). AlexaFluor secondary antibodies 
(Invitrogen, 1:200) were used in all cases. The same protocol was applied to whole 
mount hearts and embryos with primary and secondary antibody incubations 
extended to overnight. Whole mount 3,3’-diaminobenzidine (DAB) staining was 
performed on embryonic and postnatal hearts using the Vectastain Elite ABC Kit 
Goat IgG and the DAB Peroxidase Substrate Kit (both Vector Laboratories) 
following the manufacturer’s instructions. Hearts from Vegfr3'"“* mice were 
collected post-MI and stained for B-galactosidase (f-gal) activity. In brief, hearts 
were fixed on ice for 30 min in 2% formaldehyde solution containing 0.2% 
glutaraldehyde (both Sigma), washed twice with PBS on ice, and stained over- 
night at room temperature in X-gal staining solution containing 4 mM 
K,Fe(CN), 4 mM K3Fe(CN)., 2 mM MgCl, and 1 mg ml! X-gal (dissolved 
in N-dimethylformamide; Sigma). Immunofluorescence staining was imaged using 
an Olympus FV1000 confocal microscope. Maximum intensity z-projections of 
whole hearts were acquired using both the tiling and z-stack functions. DAB and 
B-gal staining was imaged using a Zeiss stereo microscope. All images were pro- 
cessed using ImageJ software. Analysis of vessels and branching calculations were 


performed using AngioTool”. Cell lineage contribution was quantified by counting 
Prox1* reporter” nuclei versus singly Prox1* nuclei within vessels across several 
fields of view per heart analysed. 

Flow cytometry. Isolation of eGFP* cells from Tie2-Cre;Prox1-eGF. and 
Tie2-Cre;sProx1-eGFP™" hearts and RNA extraction was performed according 
to standard protocols. 

Yolk sac explants. E8.0 explants including the intact yolk sac were cultured for 5 
days in Dulbecco’s Modified Eagle Medium with 20% fetal calf serum and 10> 
mol 1~! 2-mercaptoethanol (Life Technologies) supplemented with 100 ng ml7! 
of recombinant human VEGF-C(C156S) (R&D systems). 

Murine cardiac injury model. Vegfr3""””* (ref. 34) or C57BL/6 female mice were 
subject to surgery between 8 and 10 weeks, with a weight of 17-23 g. Mice were 
anaesthetized with 2.5% isofluorane and placed under assisted external ventila- 
tion through the insertion of an endotracheal tube. Cardiac injury was induced by 
permanent ligation of the left descending artery (LAD). LAD-ligation mice were 
directly compared with sham-operated animals which underwent tracheotomy, 
opening of the chest and insertion of the needle trough the left ventricle but no 
suture ligation. Buprenorphine (buprenorphine hydrochloride; Vetergesic) was 
delivered as a 0.015 mg ml ' solution via intraperitoneal injection at 20 min 
before the procedure to provide analgesia. On recovery mice were randomly 
allocated to receive an intraperitoneal injection of 0.1 ugg ' recombinant human 
VEGF-C (C156S) (R&D systems) or PBS. Further injections were administered at 
2, 3, 4 and 6 days post-surgery. Experimenters were blind to treatment groups for 
subsequent cardiac cine-MRI and analysis. Hearts were collected at 1, 2, 4, 7, 14, 
21, 28 and 35 days post-MI and either sectioned or left intact and prepared for 
histology, immunofluorescence, RNA and protein extraction. Mice were housed 
and maintained in a controlled environment. All surgical and pharmacological 
procedures were performed in accordance with the Animals (Scientific 
Procedures) Act 1986, (Home Office, UK). 

Cardiac cine-MRI. Cardiac cine-MRI was performed post-LAD ligation as 
described”. In brief, mice were anaesthetized with 2% isoflurane in O and posi- 
tioned supine in a purpose-built cradle. ECG electrodes were inserted into the 
forepaws and a respiration loop was taped across the chest. The cradle was 
lowered into a vertical-bore, 11.7 T magnetic resonance system (Magnex 
Scientific) with a 40 mm birdcage coil (Rapid Biomedical) and a Bruker console 
running Paravision 2.1.1 (Bruker Medical). A stack of contiguous 1-mm thick 
true short-axis ECG -gated cine-FLASH images were acquired to cover the entire 
left ventricle (TE/TR 1.43/4.6 ms; 17.5° pulse; field of view 25.6 X 25.6 mm; 
matrix size 128 X 128 zero filled to 256 X 256 giving a voxel size of 100 X 100 
X 1,000 pm; 20 to 30 frames per cardiac cycle). Long-axis two-chamber and four- 
chamber images were also acquired. 

MRI data analysis. Blinded image analysis was performed using ImageJ (NIH). 
Left ventricular mass, volumes and ejection fraction were calculated as 
described”’. The relative infarct size was calculated from the average of the endo- 
cardial and epicardial circumferential lengths of the thinned, akinetic region of all 
slices, measured at diastole, and expressed as a percentage of the total myocardial 
surface”’. 

Statistical analysis. No statistical methods were used to predetermine sample 
size. Statistical difference between groups was evaluated using Student’s t-test 
(two-tailed) or one-way ANOVA. A P value of <0.05 was considered statistically 
significant. All values and graphs present the mean value + s.e.m. 
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Extended Data Figure 1 | Molecular characterization of the murine cardiac 
lymphatic vasculature. a—-j, Whole-mount DAB staining of hearts (n = 3 per 
time point) with the lymphatic marker VEGFR-3 revealed cardiac lymphatic 
vessels first sprout from the region of the sinus venosus, on the dorsal side of the 
heart at E14.5 (a, white box, enlarged in b). At E16.5, ventrally the first small 
vessels arose between the atria (c), while the main dorsal vessels spread 
inferiorly from the sinus venosus at the inflow region of the heart (d). At E18.5 
the network appears similar with little expansion (e, f). From birth (P0) 
lymphatic vessels branch and expand onto the ventral epicardial surface of the 
heart such that by P10 the network has expanded markedly, coincident with 
cardiac growth (g, h). Consistent with the systemic lymphatic vasculature, 
cardiac lymphatic vessels are fully developed by P15 (i, j), with no difference in 


vessel density at later stages (data not shown). k-n, Whole-mount DAB 
staining of E17.5 hearts with the lymphatic marker Prox] (nm = 4) further 
confirmed extensive spread of the sprouting lymphatics inferiorly from the 
outflow tract region (k) and sinus venosus, at the inflow region of the heart 
(1). White inset boxes in k and] are shown in m and n, respectively, highlighting 
the punctate nuclear expression of Prox] in coronary lymphatics. o-v, Whole- 
mount confocal imaging of E17.5 hearts (n = 4) stained with VEGFR-3, Prox] 
and Lyve-1 confirmed co-labelling of coronary lymphatic vessels. Note that 
while at this developmental stage VEGFR-3 is restricted to LECs (p, t), Prox] is 
also expressed in the underlying myocardium (0, s) and Lyve-1 labels tissue- 
resident macrophages (q, wu). Scale bars: a, 750 tum; b, 300 um; ¢, d, 750 jim; 
e, f, 1 mm; g, h, 2 mm; i, j, 2.5 mm; k, 1, 400 ttm; m, n, 200 um; 0, s, 100 pm. 
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Extended Data Figure 2 | Cardiac lymphatic vessels do not emerge from the 
developing coronary vasculature. a—o, Whole-mount confocal imaging of 
hearts stained with Emcn (vessels) and Prox] (lymphatics) revealed sprouting 
of Prox1* lymphatics from extra-cardiac tissue neighbouring the sinus venosus 
on the dorsal side of the developing heart at E12.5-13.5 (a, d), but no Prox1* 
LECs were observed budding from Emcn™ coronary vessels (c and f; white 
arrowheads in b and e highlight Proxl * LECs). Prox" lymphatics had reached 
the sinus venosus by E13.5 (white arrow in f) and the outflow tract, on the 


ventral side of the heart by E14.5 (white arrow in h); no Proxl* LECs were 
observed emerging from Emcn* vessels on the ventral side at E14.5 

(g-i). Background-like labelling on the ventricular surface in c, f and i reflects 
Prox] expression in the developing myocardium. Between E15.5-17.5, Proxl* 
lymphatics aligned with Emcn™ coronary veins but no contribution of Prox1* 
LECs was observed (j, l and n, white boxes enlarged in k, m and 0, respectively; 
n = 5 hearts analysed per time point). Scale bars: a, 550 tum; ¢, f, i, 250 um; 
d, g, 750 tum; j, 1, n, 400 pum; k, m, 0, 200 pum. 
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Extended Data Figure 3 | The common cardinal vein contributes LECs that 
migrate towards the sinus venosus and outflow tract of the developing heart. 
a-c, Whole-mount confocal analysis of E10.5 embryos stained with Emcn or 
Proxl and VEGER-3 revealed Proxl/VEGFR-3” LECs emerging along the 
common cardinal vein (a, red box enlarged in b; white box enlarged in 

c) migrating towards the sinus venosus (white arrowheads in c; n = 3 embryos). 
d-f, Whole-mount DAB staining revealed Prox * LECs migrating towards the 


outflow tract, on the ventral surface of the developing heart at E12.5 (d, white 
inset box enlarged in e; alternative lateral view in f; white arrowheads indicate 
migrating LECs; n = 4 embryos). ba, branchial arch; ccv, common cardinal 
vein; fl, forelimb; h, heart; isv, inter-somitic vessel; la, left atrium; lv, left 
ventricle; oft, outflow tract; paa, pharyngeal artery arch; ra, right atrium; rv, 
right ventricle. Scale bars: a, 1 mm; b, 500 um; c, 200 Lm; d, 600 pum; e, 400 jum; 
£, 300 pm. 
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Extended Data Figure 4 | Tie2-Cre efficiently labels the developing cardinal 
vein and partial contribution of Pdgfb* -derived LECs indicates a non- 
venous contribution to the developing cardiac lymphatics. a—d, Tie2- 
Cre;R26R-eYFP lineage tracing revealed recombination and labelling of the 
cardinal vein and jugular lymph sacs at E10.5 (a; n = 3 embryos analysed). 
Plane of section to capture jugular lymph sacs is shown in b. White inset box in 
a is shown at higher magnification and demarcated by GFP (c) and Emcn 
(d) co-staining. e-o, Schematic (e) to show how embryos were generated by 
breeding Pdgfb-CreER™ mice with either R26R-tdTomato (f-I) or R26R- 


mTmG (m-o) reporter mice and then being injected with 4-hydroxytamoxifen 
(4-OHT) at E9.5, before venous sprouting. Whole-mount confocal analysis of 
E17.5 hearts (n = 4) stained with Lyve-1 revealed incomplete tdTomato 
recombination in cardiac lymphatic vessels (f). Both Pdgfb* (g-i) and Pdgfb~ 
(j-I; m-o) lymphatic vessels were observed, highlighted by the dotted green 
outlines (g, j, m), indicating a combined Pdgfb* endothelial origin and Pdgfb 
non-venous source for the cardiac LECs. Scale bars: a, 200 um; b, 1.5 mm; 

c, d, 50 um; f, 400 Lm; e, 1, 0, 100 pm. 
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Extended Data Figure 5 | Neither the pro-epicardial organ, cardiac 
mesoderm nor cardiac neural crest contribute LECs to the developing heart 
and dermal lymphatics are not derived from the Vav1* lineage. a-I, Lineage 
tracing using WT1-CreERT2;R26R-eYFP (4-hydroxytamoxifen injected at 
E9.5; a-c), Mesp1-Cre;R26R-eYFP (d-f), Nkx2.5-Cre;R26R-EYFP (g-i) and 
Wnt1-Cre;R26R-eY FP (j-l; n = 3 hearts analysed per lineage trace) showed no 
YFP recombination in cardiac lymphatic vessels as marked by Prox] or Lyve-1, 
suggesting that neither the pro-epicardial organ/epicardium, cardiac 
mesoderm (early or late) or cardiac neural crest, respectively, contribute LECs 
to the developing cardiac lymphatics. m—o, Embryos generated by breeding 
Vavl-Cre with R26R-tdTomato reporter mice were subject to whole-mount 


confocal analysis of E17.5 dorsal skin preparations (n = 4 Vavl-tdTomato™ 
embryos analysed). tdTomato epifluorescence (m) and Prox] immunostaining 
(n) revealed a lack of Vav1-Cre recombination in dermal lymphatic vessels 
(highlighted by the green dotted lines, m) and a lack of overlap of tdTomato 
with Prox1 and Lyve-1 expression (0; all Prox1* nuclei assessed across 5 fields 
of view per embryonic skin; n = 4 skins in total). p-r, Higher magnification of 
inset white box (n) revealed that tdTomato “ cells (p) did not overlap with 
Prox1* nuclei in the lymphatic vessels (q) (white arrowhead highlights 
tdTomato*/Proxl cell in r). Scale bars: a-I, 100 jim; m-o, 100 Jim; p-r, 

50 um. 
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Extended Data Figure 6 | The Vav1* lineage does not contribute to LECs 
emerging from the common cardinal or jugular veins but contributes to 
VEGF-C induced LECs emerging from yolk sac explants. a—d, Vavl- 
Cre;R26R-tdTomato lineage tracing revealed no recombination nor labelling of 
the nascent LECs budding from common cardinal vein endothelium in E10.0 
embryos (a) as confirmed by co-staining for Emcn, Prox] and tdTomato 
fluorescence. White inset box in a is highlighted in enlarged panels (b-d; arrows 
indicate Prox1* (blue) LECs delaminating from the common cardinal vein 
b, c). In the sinus venosus region Vavl* cells were evident but lacked Prox1 
expression, excluding an LEC identity (d). e-h, Vavl-Cre;R26R-eYFP lineage 
tracing revealed no recombination nor labelling of LECs forming the jugular 
lymph sacs in E12.5 embryos (e) as confirmed by co-staining for GFP, Emcn 
and Prox1. White inset box in e highlighted by individual GFP (f), Emcn 

(g) and Prox] (h) staining (n = 3 embryos analysed per time-point). ccv, 


common cardinal vein; jls, jugular lymph sac; jv, jugular vein; sv, sinus venosus. 
i, j, Representative staining for Prox1 and native tdTomato fluorescence of ex 
vivo cultures of explanted Vavl-Cre;R26R-tdTomato conceptuses at E8.0, 
including the intact yolk sac (i) and outgrowth of tdTomato-+ cells (j). Explants 
were cultured with 100 ng ml! of recombinant VEGE-C(C156S)”* (R&D 
Systems), a potent selective lymphangiogenic cue that only signals via VEGFR- 
3 (i). k-z, High-resolution images of the specification of tdTomato* /Prox1* 
LECs (indicated by white inset boxes) in the yolk sac explants (k-n) and in the 
surrounding cellular outgrowth (o-z) was observed (tdTomato* in red; 
Prox1” in blue; single and merged channels shown). Co-staining was 
confirmed by z-stack reconstructions for each four high-resolution panel set 
(n, r, V, Z); 1 = 6 explants analysed. Scale bars: a, e, 50 1m; b-d and f-h, 12.5 
fim; i, 100 tim; j, 50 um; m, q, u, y, 15 pm. 
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Extended Data Figure 7 | Prox] knockdown results i in significantly 
decreased Vegfr3 and Lyvel and Tie2-Cre;Prox1” mutant embryos exhibit 
superficial vascular defects whereas Vav1- Cre;Prox1™" mutants have a 
normal systemic vasculature. a, Prox/ targeting via floxed excision of exon 1 
and 2, results in EGFP expression thus labelling targeted cells. b, E17.5 hearts 
from either Tie2-Cre;sProx?”* control embryos or Tie2-Cre; Prox! mutants 
were grouped and digested to create a single-cell suspension for FACS. A total 
of 100,000 GFP* cells were collected for each sample group. c-e, Relative gene 
expression was determined by qRT-PCR and revealed significantly decreased 
Prox1 (c;0.59 fold), Vegfr3 (d; 0.39 fold) and Lyvel (e; 0.22 fold) expression; n = 
5 hearts per sample group, analysed in triplicate; *P = 0.05; **P = 0.01; ***P = 
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0.001. All graphs are mean ~ s.e.m. Statistical test used was Student’s t-test. 
f-i, Dissection of Tie2-Cre;Prox¥”* heterozygous (f; n = 6) and Tie2- 
Cre;Prox!“" mutant (g, h, i; n = 9) littermate embryos at E17.5 revealed gross 
vascular anomalies in the double-floxed mutants (three examples shown in 
g-i), with evidence of ectopic surface blood vessels (g, ectopic vessels 
highlighted by black arrowheads), a disrupted vascular network (h; black 
arrowheads indicate blood-filled superficial vessels) and either haemorrhaging 
(i; bleeding foci highlighted by black arrowheads) or blood-filled lymphatics, 
compared to littermate ProxP”* controls (f). j-m, Vavl-Cre;Prox ”* 
heterozygous (j, k; n = 5) and Vav1-Cre;Prox I “fl mutants (1, m; 1 = 8) revealed 
no obvious systemic vessel defects. Scale bars: g, i, m, 100 jum. 
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Extended Data Figure 8 | The emergence of cardiac lymphatics at E14.5 is 
disrupted in Tie2-Cre;Prox1™” mutant hearts, which are dysmorphic and 
exhibit elevated apoptosis of LECs, however, mutant embryos recover with 
normal cardiac lymphatics at post-natal stages. a-c, GFP” -targeted and 
Lyve-1" LECs emerged from the base of the heart in the atrioventricular region 
at E14.5 in control Tie2-Cre;Prox¥”* hearts (LECs highlighted by white 
arrowheads in a and b; n = 3 hearts analysed). d-f, In mutant Tie2- 
Cre;Proxi" hearts (n = 7) the GEP* network was absent (d) and Lyve-1 only 
detected tissue resident macrophages with an absence of lymphatics at the 
inflow base of the heart (arrows in e, f). g, h, Coronary vessels, as determined by 
whole mount CD31 staining, were comparable between control Tie2- 
CresProxI/* (g) and Tie2-Cre;Prox ™ hearts (h). i, j, Haematoxylin and eosin 
staining of paraffin-embedded E17.5 hearts revealed that Tie2-Cre;Prox1!! 
mutants (j; n = 3 analysed) were grossly smaller compared to control hearts 
(i), with lack of extension of the ventricles towards the apex, smaller chambers 
and thickening of the ventricular free wall (j). Normal membranous septation 
of the mutant ventricle (white asterix) and valve leaflet formation (white 


ARTICLE 


arrowhead in j) indicate normal endocardial cushion development. k, 1, Whole- 
mount confocal imaging of hearts stained with GFP, cleaved caspase-3 and 
Prox] revealed an increase in apoptotic cells within the termini of mutant 
coronary lymphatic vessels (white arrowheads in magnified panels), compared 
to control hearts, supporting the requirement for Prox] in LEC identity and 
maintenance. n = 3 hearts analysed for histology and immunostaining. 

m-p, Whole-mount VEGFR-3 immunostaining of hearts isolated at P7 
revealed that Tie2-CresProx 1 a heterozygotes (m, n) and Tie2-CresProx 1 fl 
mutants (0, p) have an equivalent normal cardiac lymphatic vasculature (n = 3 
hearts analysed per genotype). As such the lymphatic hypoplasia and 
disruption of the vessel network, evident in mutant hearts at E17.5 (Extended 
Data Fig. 10), is rescued during the later stages of development and neonatal 
period. qx, In Vavl-Cre;Prox 1" hearts (n = 4) there was evidence of an initial 
formation of the cardiac lymphatics on both ventral (q, s) and dorsal 

(u, w) surfaces and the coronary vessels were unaffected (r, t, v, x). Scale bars: 
f, h, x, 400 pum; j, 1 mm; k, 400 pm; 1, 50 um; m, 500 pum. 


©2015 Macmillan Publishers Limited. All rights reserved 


S20 ARTICLE 


‘Prox1-EGFP 


Tie2-Cre 


Oo 
LL 
O 
a 
ay 

x< 

o 
oO 


Vav1-Cre 


©2015 Macmillan Publishers Limited. All rights reserved 


Extended Data Figure 9 | Prox] is essential for Tie2*- and Vav1 * -derived 
cardiac lymphatics. a-d, In control Tie2-Cre;sProx!”* mice at E17.5 (n = 6) 
there was an extensive lymphatic network on both the dorsal and ventral 
surfaces, as indicated by whole-mount VEGFR-3 immunostaining 

(a, b), whereas the lymphatic vessels were virtually absent in mutant Tie2- 
Cre;Prox“" hearts (c, d; n = 9); a few vessels evident on the dorsal surface was 
consistent with LECs arising from a non-Tie2-targeted source (c, white 
arrowheads). Tie2-Cre;Prox!” mutant hearts were dysmorphic relative to 
controls (compare c, d with a, b). e, f, GEP* staining indicated targeting of 
Prox1 in Tie2-CresProx!”* mice (e) and an expansive Lyve-1* lymphatic 
network (f). g, h, Prox] immunostaining confirmed expression in LECs in 
heterozygote controls (g; inset box shown at higher magnification in h). Co- 
expression of GFP/Lyve-1 and Prox] was evident in LECs in addition to Lyve- 
1/Prox1 double-positive cells not targeted by Tie2-Cre (h, white arrowheads). 
i-l, In contrast, Tie2-Cre;Prox!“" mutant hearts revealed an absence of the 
GEP* lymphatic network with only a minor contribution of Lyve-1 LECs 
evident at the base of the heart on the dorsal surface (i, j; white arrowhead in 
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j highlights retained Lyve-1* LECs), which were Proxl* (k, 1). m-p, On the 
ventral surface there was complete absence of GFP and Lyve-1* lymphatic 
vessels (m); Lyve-1 staining was retained in tissue-resident macrophages 

(n). Loss of LECs correlated with a loss of Prox] (0, p). q-t, In control Vav1- 
Cre;Prox”* mice at E17.5 (n = 5) there was evidence of appropriate targeting 
of GEP* LECs (q) and an extensive lymphatic network on the dorsal surface as 
indicated by Lyve-1 (r). Proxl1 expression was retained (s), which at higher 
resolution revealed co-expression of GEP*/Lyve-1"/Prox1* in a 
subpopulation of LECs, consistent with the lineage trace data (Fig. 3a—d; white 
inset box in s shown at higher magnification in t). u-x, In Vav1 -Cre:Prox I if 
mutant hearts (n = 8) there was equivalent GFP targeting (u) anda Lyve-1* 
network (v) with retained Prox1 expression (w, x). y, Z, Specific loss of Lyve-1 
staining (y) correlated with loss of Prox1 and GFP-targeting (z, left and right 
panels, respectively, highlighted by white arrowheads). Mosaic levels of Prox1 
knockdown accounted for examples of isolated LECs that, despite GFP- 
targeting, remained Lyve-1* (white arrows in y, z). Scale bars: d, g, 1, p, s, x, 400 
tim; h, t, 40 pum; y, 5 pum. 
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Extended Data Figure 10 | Prox1 knockdown in Tie2-Cre;Prox!™ mutants 
results in a hypoplastic and disrupted lymphatic plexus. a-f, Relative to Tie2- 
Cre;sProx¥”* control hearts at E17.5 (a—c), GFP lymphatic vessels were 
thinner and the network truncated along the short axis, having failed to 
appropriately extend and remodel in Tie2-Cre;Prox!™ hearts with partial 
knockdown of Prox1 (d-f; see Extended Data Fig. 7c; n = 4 hearts per genotype; 
representative regions indicated by white inset boxes in a, d). g-l, Higher 
magnification of the lymphatic plexus in Tie2-Cre;ProxP”* control (g-i) and 
Tie2-Cre;Prox#" mutants (j-) were captured for AngioTool analyses. 
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AngioTool tracing in red of GFP” vessels and blue for branch points 

(g-l), enabled quantitative assessment of vessel parameters. m-o, The mutant 
lymphatic vessels were significantly shorter in overall length (m), more 
truncated and disorganized with an increased total number of end points 

(n). Mutant vessels were also significantly reduced in diameter, being thinner 
on average, compared to controls (0). Scale bars: ¢, f, 400 um; 1, 30 mm; All 
graphs show mean = s.e.m. Student’s t-test; *P = 0.05; **P = 0.001 (n = 4 
hearts analysed per genotype). 


©2015 Macmillan Publishers Limited. All rights reserved 


Pod i se 


doi:10.1038/nature14488 


Interaction and signalling between a cosmopolitan 
phytoplankton and associated bacteria 


S. A. Amin’, L. R. Hmelo®, H. M. van Tol', B. P. Durham’, L. T. Carlson’, K. R. Heal', R. L. Morales', C. T. Berthiaume’, 
M. S. Parker’, B. Djunaedi’, A.E. Ingalls’, M. R. Parsek*, M. A. Moran® & E. V. Armbrust! 


Interactions between primary producers and bacteria impact the 
physiology of both partners, alter the chemistry of their environ- 
ment, and shape ecosystem diversity’”. In marine ecosystems, these 
interactions are difficult to study partly because the major pho- 
tosynthetic organisms are microscopic, unicellular phytoplank- 
ton®. Coastal phytoplankton communities are dominated by 
diatoms, which generate approximately 40% of marine primary 
production and form the base of many marine food webs‘. 
Diatoms co-occur with specific bacterial taxa’, but the mechanisms 
of potential interactions are mostly unknown. Here we tease apart 
a bacterial consortium associated with a globally distributed 
diatom and find that a Sulfitobacter species promotes diatom 
cell division via secretion of the hormone indole-3-acetic acid, 
synthesized by the bacterium using both diatom-secreted and 
endogenous tryptophan. Indole-3-acetic acid and tryptophan 
serve as signalling molecules that are part of a complex exchange 
of nutrients, including diatom-excreted organosulfur molecules 
and bacterial-excreted ammonia. The potential prevalence of this 
mode of signalling in the oceans is corroborated by metabolite and 
metatranscriptome analyses that show widespread indole-3-acetic 
acid production by Sulfitobacter-related bacteria, particularly in 
coastal environments. Our study expands on the emerging recog- 
nition that marine microbial communities are part of tightly con- 
nected networks by providing evidence that these interactions are 
mediated through production and exchange of infochemicals. 

In terrestrial systems, interactions between photosynthetic organisms 
and bacteria occur primarily within the rhizosphere, a region surround- 
ing plant roots in which gradients of released molecules support distinct 
microbial communities, enhancing the growth of some bacteria while 
restricting the growth of others”. In aquatic systems, similar interactions 
were proposed over 40 years ago to occur within the phycosphere, a 
rhizosphere analogue®. Today, theoretical and empirical studies confirm 
that phytoplankton are surrounded by a diffusive boundary layer in 
which secreted molecules accumulate in excess of bulk seawater con- 
centrations’”*, enhancing the potential for bacterial detection of, and 
communication and interaction with, algal cells’. 

Marine diatoms commonly co-occur with members of the Proteo- 
bacteria and Bacteroidetes in laboratory cultures and some natural 
blooms’. To identify mechanisms underlying specific interactions, 
we isolated 49 cultivable bacterial strains co-occurring with four iso- 
lates of the coastal diatom Pseudo-nitzschia multiseries originating 
from the Pacific and the Atlantic Oceans (Extended Data Table 1). 
We focus on P. multiseries, a diatom with a publicly available 
draft genome, because of its ubiquitous distribution in coastal ecosys- 
tems, ecological importance as a harmful alga’, and relatively large size 
(~50 pm). 

Bacteria affiliated with the Sulfitobacter, Hyphomonas, Marinobacter, 
Limnobacter, and Croceibacter were among isolated bacteria and dis- 
played more than 97% identity in 16S rRNA sequences regardless 


of the originating P. multiseries culture (Extended Data Fig. 1 
and Supplementary Information Table 1). The potential impact 
of these bacteria on host physiology was examined by first 
curing P. multiseries PC9 of bacteria via antibiotic treatment'®. The 
specific growth rate of P. multiseries PC9 was not significantly 
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Figure 1 | Growth characteristics of the P. multiseries-Sulfitobacter sp. 
SA11 co-culture. a, Growth of axenic P. multiseries PC9, PC9 with SA11, and 
PC9 with the bacterial consortium as monitored by relative chlorophyll a 
fluorescence. Inset: cell concentration of SA11 grown without PC9 (filled 
squares) or with PC9 (open squares). Error bars, s.d. of triplicate cultures. 
Axenic versus co-culture with SA11 growth experiments were replicated five 
times. b, Abundance of axenic P. multiseries GGA2, GGA2 with SA11, axenic 
P. multiseries PC4, and PC4 with SA11. Inset: cell concentration of SA11 
grown with GGA2 (circles) or with PC4 (triangles). Error bars, s.d. of 
triplicate cultures. 
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Figure 2 | Model of P. multiseries-Sulfitobacter interactions based on 
transcriptomic and targeted metabolite analyses. Molecules with a structure 
indicate detection in the co-culture supernatant. Genes/transporters/metabolic 
cycles are shown as upregulated (red), downregulated (blue), or not differ- 
entially regulated (white) in co-culture relative to monocultures. Metabolic 
cycles were assigned an expression pattern if at least one gene specific for the 


affected by removal of its bacterial consortium in the short term 
(Maxenic = 0.75 + 0.033 Lconsortium = 0-80 + 0.10 d~') (Fig. 1a). 
Over the longer term (>18 months), the growth rate decreased sig- 
nificantly (to [laxenic < 0.3 d~'), implying dependence on bacteria"'. 
Within 7 months of curing it of bacteria, P. multiseries PC9 was 
co-cultured with individual bacterial strains in a synthetic seawater 
medium lacking added organic carbon”, ensuring bacterial growth 
was dependent upon diatom released organic molecules. Growth 
rates of the cured diatom were unaffected when co-cultured with 
Marinobacter or Limnobacter strains, whereas a Croceibacter strain 
was lethal. Four Sulfitobacter strains significantly enhanced the spe- 
cific growth rate of the diatom by 18-35% despite use of a medium 
optimized to support axenic diatom growth” (Fig. 1a, Extended Data 
Fig. 2a and Extended Data Table 2). Co-culture with two Phaeobacter 
strains closely related to Sulfitobacter did not enhance diatom growth 
(Extended Data Table 2). Together, these results indicate that the 
Sulfitobacter strains produce a diatom growth-altering factor. A single 
strain of Sulfitobacter (SA11) was chosen for further study. 

The growth effect of SA11 appears remarkably specific. No growth 
enhancement was observed when SA11 was co-cultured with another 
diatom Thalassiosira pseudonana or with two of the four strains of 
P. multiseries (Fig. 1b, Extended Data Table 3 and Extended Data 
Fig. 2b). SA11 cell numbers increased over time in co-culture with 
responsive P. multiseries strains (PC9, GGA2), but not with a non- 
responsive strain (PC4) (Fig. 1 insets), implying the diatom somehow 
modulates SA11 growth. To identify pathways involved in the interaction, 
we generated a draft genome sequence for SA11 and used a combination 
of comparative whole-cell transcriptomics and targeted metabolite ana- 
lyses of the partners grown either in isolation or in co-culture. 

In co-culture, P. multiseries PC9 provided SA11 with the organic 
carbon necessary for growth, as evidenced by increased SA11 cell 
numbers (Fig. la inset). Transcriptome changes implicated diatom- 
produced taurine, a sulfonated intracellular metabolite previously 
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cycle was differentially expressed and no others were regulated in the opposite 
direction. Supplementary Information Tables 1 and 2 list fold-expression 
and statistical significance based on triplicate biological experiments. IAA 
potentially regulates expression of two cyclins that typically regulate the cell 
cycle®. Trp, tryptophan; DMS, dimethy] sulfide; PSI, PSII, photosystem I, I]; 
CYC2, CYC8, cyclins 2 and 8; IAALD, indole-3-acetaldehyde. 


identified in several Pseudo-nitzschia species’. P. multiseries increased 
transcription of cysteine dioxygenase (cdo), the enzyme that catalyses 
the first step in biosynthesis of taurine from L-cysteine and whose 
activity is correlated with intracellular taurine concentrations”. 
SA11 increased abundance of transcripts required for taurine uptake 
(tauABC) and catabolism to acetate (tpa, xsc, ackA), which can feed 
into the TCA cycle’*. SA11 appears particularly responsive to diatom- 
produced organosulfur molecules as transcripts associated with 
dimethylsulfoniopropionate (DMSP) lyase (dddL) also increased 
(Fig. 2 and Supplementary Information Tables 2 and 3), suggesting 
degradation of DMSP to acrylate and dimethylsulfide’’. P. multiseries 
also increased the abundance of transcripts associated with photosys- 
tem II (psbB), light-harvesting proteins (LHCA4, LHCF4), fucox- 
anthin, and enzymes in the Calvin cycle while reducing most 
transcripts associated with genes in the TCA cycle (Fig. 2 and 
Supplementary Information Table 2). These observations suggest 
decreased respiration and increased photosynthesis and carbon fixa- 
tion, perhaps fuelling carbon excretion to SA11. 

Symbiotic interactions, such as terrestrial plant-microbe interac- 
tions, commonly involve exchange of reduced nitrogen®. The sole 
source of added nitrogen in the growth medium was nitrate’. 
However, in co-culture, P. multiseries decreased abundance of tran- 
scripts associated with nitrate transport (NRT1, NRT2) and reduction 
to ammonia (NR, NiR) while SA11 increased the abundance of tran- 
scripts associated with nitrate uptake (nitTABC) and reduction to 
ammonia (nirB) (Fig. 2 and Supplementary Information Tables 2 
and 3). Significantly more ammonium was detected in the co-culture 
medium than in the medium blank or when the diatom was cultured 
alone, indicating that SA11 released a fraction of its imported nitrate 
into the media as ammonium (Extended Data Fig. 3a). Together, these 
results suggest that, in co-culture, SA11 increases nitrate uptake and 
ammonium release and that P. multiseries preferentially utilizes bac- 
terial-derived ammonium for growth, rather than exogenous nitrate. 
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Addition of NH4Cl to axenic P. multiseries had no impact on growth 
(u: 0.57 + 0.03 d~' vs 0.55 + 0.02 d~'), indicating that although 
reduced nitrogen was essential to the interaction, another molecule 
was responsible for the growth effect. 

Tryptophan and related derivatives are common signalling molecules 
in the marine environment'®””. In co-culture, P. multiseries increased 
transcript abundance for the conversion of indole to tryptophan and for 
a putative tryptophan/tyrosine permease (Fig. 2 and Supplementary 
Information Table 2), suggesting increased biosynthesis and export of 
tryptophan. Tryptophan was detected in the growth media after har- 
vesting cells when P. multiseries was maintained alone (448 + 106 pM) 
or in co-culture (202 + 20 pM) (Extended Data Fig. 3b). A reduced 
tryptophan concentration in the co-culture suggests that SAI11 
may be importing diatom-released tryptophan. Increased transcript 
abundance for endogenous tryptophan biosynthesis and decreased 
transcript abundance for tryptophan degradation by SA11 (Fig. 2 and 
Supplementary Information Table 3) suggest that SA11 increased util- 
ization of extra- and intracellular tryptophan in co-culture. 

Co-culture with P. multiseries triggered SA11 to increase transcripts 
associated with the indole-3-acetamide (IAM) and tryptamine (TAM) 
pathways (Fig. 2 and Supplementary Information Table 3) that convert 
tryptophan to indole-3-acetic acid (IAA), an endogenous plant hor- 
mone that is also produced and excreted by rhizobia to skew symbiotic 
plant development’*. IAA was detected in the growth medium when 
SA11 was maintained alone or in co-culture (Extended Data Fig. 3c). 
Assuming a constant rate of production and release of [AA by SA11, the 
concentration in the co-culture (6.1 + 0.4 pM) was significantly lower 
than predicted (540 pM) (see Methods), implying that P. multiseries 
takes up a minimum of 5 amol IAA per cell each day. We confirmed that 
P. multiseries is responsive to a narrow range of synthetic IAA (50-100 
nM) added either once during the growth cycle (Extended Data Table 4) 
or as multiple 50 nM additions over 8 days (Extended Data Fig. 4). The 
growth enhancement by SA11 (19-35%) versus IAA (~10%) suggests 
that SA11 produces other molecules that further enhance P. multiseries 
growth. Furthermore, the difference in orders of magnitude in concen- 
tration of synthetic versus bacterial IAA required to stimulate a diatom 
response (nanomolar vs picomolar) reiterates the potential importance 
of phycosphere interactions where local concentrations of IAA are 
significantly higher than bulk concentrations in the media, an obser- 
vation consistent with previous work on diffusive boundary layers”””. 

To explore the potential prevalence of these interactions in natural 
populations, we looked for evidence of bacterial production of [AA by 
performing targeted metabolite analysis on seawater from the surface 
and chlorophyll maxima at five stations from different regions of the 
North Pacific Ocean (Extended Data Fig. 5). Since IAA has no clear 
metabolic role in bacteria’®, production and excretion of IAA in nat- 
ural populations might indicate bacterial manipulation of responsive 
phytoplankton similar to rhizobia’. Extracellular IAA (1.5-383 pM) 
was detected in all samples, with the highest concentrations detected in 
coastal sites with high phytoplankton abundance (Fig. 3a and 
Extended Data Fig. 3c). The range of measured IAA concentrations 
in the environment is comparable to that measured in our co-culture, 
indicating that environmental IAA could elicit a response from dia- 
toms associated with [AA-producing bacteria. 

The Roseobacter clade, to which SA11 belongs, is among the most 
ubiquitous lineages observed with phytoplankton*”, and active IAA 
production by this group could impact diverse phytoplankton species, 
many of which have been shown to respond to synthetic IAA”. To 
determine whether the Roseobacter produced IAA in the field, we 
examined metatranscriptomic data sets for transcripts associated with 
the three IAA biosynthetic pathways found in publicly available 
Roseobacter genomes—the indole-3-acetonitrile (IAN), IAM and 
TAM pathways (Extended Data Fig. 6). We analysed transcripts col- 
lected at two coastal stations in the North Pacific Ocean and from three 
publicly available metatranscriptome data sets from Monterey Bay, the 
California coast, and station ALOHA in the North Pacific Gyre 
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Figure 3 | Detection of IAA and IAA biosynthesis in the marine 
environment. a, [AA concentrations at five stations in the North Pacific Ocean 
from surface (black) and chlorophyll maxima (red) waters. b, Abundance of 
transcripts from the three IAA biosynthetic pathways present in the 
Roseobacter. Thick bars represent transcripts per litre associated with any gene 
in the pathways calculated on the basis of an internal standard; thin bars 
represent percentage IAA biosynthesis transcription contributed by each 
pathway for data sets in which no internal standard information was available. 
Genes used in each pathway are in Extended Data Fig. 6. ALOHA coincides 
with station 16 in Fig. 3a. 


(Extended Data Fig. 5). In all data sets, the three IAA biosynthetic 
pathways were actively transcribed with an average abundance of 
10’ 1‘ (~0.01% of total transcripts). Transcripts associated with the 
IAN pathway dominated all data sets, with TAM and IAM transcripts 
contributing 10-40% of total IAA transcription (Fig. 3b). Although 
SA11 also possesses the IAN pathway, no IAN transcripts were 
detected in our laboratory experiments, suggesting complex regulatory 
processes control different pathways. Our results thus present a lower 
limit on IAA biosynthesis as other bacterial taxa probably also produce 
IAA, and other IAA biosynthesis pathways (Extended Data Fig. 6) 
may be active. 

The P. multiseries-Sulfitobacter model system developed here 
demonstrates the complexity of microbial interactions, potentially 
occurring within a phycosphere that concentrates hydrophobic signal- 
ling molecules and persists despite seawater turbulence”. 
Tryptophan secretion by P. multiseries may attract a wide range of 
bacteria, but only bacteria that can convert tryptophan to IAA could 
create a positive feedback loop between diatom tryptophan and bac- 
terial IAA (Fig. 2). Accumulation of IAA to significantly higher local 
concentration around algal cells relative to seawater*® would ensure 
that IAA producers residing within the phycosphere could skew the 
growth of algae whereas distant bacteria would not. Exchange of essen- 
tial molecules such as ammonia and organosulfur compounds would 
further enhance synergy. Additional signalling molecules between bac- 
teria and diatoms and among bacteria are probably key to recognizing 
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and sustaining beneficial partners and excluding cheaters. Such added 
specificity could explain the different responses of closely related 
P. multiseries strains to SA11 and is reminiscent of legume-rhizobia 
interaction specificity achieved through multiple signalling mole- 
cules”. In this context, signalling may distinguish between organisms 
with a long history of association and organisms with latent capacity 
for interaction”®. 

Besides diatoms, several unicellular green algal lineages and cyano- 
bacteria have also been shown to respond to synthetic IAA””*”?. 
Direct detection of algal responses to IAA in the field is not yet possible 
as the genetic basis for algal responses to IAA remains unknown”. 
Further work is needed to characterize these genetic elements. The 
interactions described here illustrate how bacterial influence on phyto- 
plankton physiology may be linked to the global carbon cycle and algal 
bloom formation, and probably affect ecosystem functioning. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Diatom growth and axenic culture generation. Milli-Q water (>18.2 MQ cm) 
was used for all synthetic seawater media preparations. T. pseudonana CCMP 1335 
was acquired from National Center for Marine Algae and Microbiota. P. multi- 
series strains were isolated from seawater samples collected from locations indi- 
cated in Extended Data Table 1 and were identified using Automated Ribosomal 
Intergenic Spacer Analysis (ARISA) according to ref. 31. Cultures were initially 
grown in f/2 medium” and were acclimated and further maintained for all experi- 
ments in the synthetic seawater medium, Aquil’”. All cultures were grown in 13 °C 
in a 16h light/8 h dark diurnal cycle (80 HE m 7s!) in semi-continuous batch 
cultures** with an initial cell density ~8,000-10,000 cells per millilitre for non- 
axenic cultures and ~2,000-4,000 cells per millitre for axenic cultures. Diatom 
growth was monitored by measuring in vivo fluorescence using a 10-AU fluorom- 
eter (Turner Designs) or by counting cells using a Sedgwick-rafter (Wildlife Supply 
Company). Growth rates were estimated by measuring in vivo chlorophyll a 
fluorescence (relative fluorescence units) or cell counts. Specific growth rates (11) 
were calculated from the linear regression of the natural log of in vivo fluorescence 
or cell counts versus time during the exponential growth phase of cultures. 
Standard deviation of j was calculated from jz values from biological replicates 
(n = 3 unless otherwise indicated) over the exponential growth period. Percentage 
growth enhancement was calculated as the difference between [co-culture ANd [axenic 
divided by Mco-culture: 

Axenic cultures were generated by adopting the protocol from ref. 10 with 
minor modifications as described below. Approximately 25 ml of a mid-exponen- 
tial phase growing diatom culture was gravity filtered onto 0.65 jum pore-size 
polycarbonate membrane filter (Millipore). Cells were quickly rinsed with sterile 
Aquil media. Using sterile tweezers, the filter was carefully removed from the 
filtration unit and washed for ~1 min in sterile media containing 20 pg ml 
Triton X-100 detergent to remove surface-attached bacteria. The filter was dis- 
carded after re-suspension of cells by gentle shaking in sterile detergent-free 
media. Cells were again gravity filtered onto a fresh 0.65 jum pore-size polycarbo- 
nate membrane filter and rinsed with sterile media. Subsequently, cells were 
washed off the filter by gentle shaking into sterile media containing a suite of 
antibiotics (per millilitre: 50 jug streptomycin, 67 jg gentamycin, 20 jig ciproflox- 
acin, 2.2 j1g chloramphenicol, and 100 yg ampicillin). Cells were incubated in 
antibiotic-containing media for 24-48 h under regular growth conditions. 
Finally, 0.5-1.0 ml of antibiotics-treated cells were transferred to antibiotic-free 
media. Cultures were regularly monitored (every four or five transfers, ~1 month) 
for bacterial contamination by checking for bacterial growth in Zobell marine 
broth** in addition to using Sybr Green I (Invitrogen) staining and epifluorescence 
microscopy (Nikon Eclipse 80i) as described previously**. Bacterial contamination 
was observed only once over the course of ~18 months for PC9 owing to human 
error; the culture was discarded and fresh axenic cultures were prepared as above. 
Bacterial growth, isolation, and classification. Bacteria were typically grown on 
marine agar plates (per litre: 5 g peptone, 0.5 g yeast extract, 15 g agar, and 750 ml 
seawater) incubated at 20 °C in the dark or in marine broth™ at 30 °C with shaking 
at 150 r.p.m. Bacterial growth was measured by counting colony-forming units or 
by using a Guava EasyCyte Plus flowcytometer (Millipore) after cells were stained 
with Sybr Green I stain. 

Bacteria were isolated from late-exponential phase growing P. multiseries cul- 
tures by serially diluting 0.5 ml aliquots of culture into sterile Aquil. Diluted 
aliquots were then plated onto agar plates containing, per litre of seawater, 15 g 
agar and 2 g ofa carbon source (peptone and yeast extract, succinate, glucose, CAS 
amino acids, or only background organic carbon in seawater). Plates were incu- 
bated at room temperature in the dark and morphologically different bacterial 
colonies were isolated and stored in 15% glycerol stocks at —80°C for future 
experiments. 

To identify isolated bacteria, isolates were grown from single colonies in marine 
broth overnight and cells were centrifuged at 13,000g for 2 min. The supernatants 
were removed and DNA was extracted using a DNA Blood & Tissue kit (Qiagen) 
according to the manufacturer’s instructions. Using universal 16S rDNA primers 
(27F, 1492R), 16S rDNA from all bacterial isolates was amplified using a Taq DNA 
polymerase kit (Apex). The temperature profile for PCR consisted of an initial 
incubation at 94 °C for 3 min, followed by 32 cycles of 94°C for 30 s, 55 °C for 1 
min and 72 °C for 2 min, and a final extension step at 72 °C for 20 min. Amplified 
product was cleaned using a High Pure PCR Product Purification Kit (Roche). 
Purified PCR products were sequenced using Sanger technology (Genewiz). 

Sequences were quality trimmed using Sequencher 4.6 (Gene Codes) and ini- 
tially aligned using ClustalW as implemented in Mega 5.2.2 (ref. 36). The align- 
ment was refined using NAST (http://greengenes.Ibl.gov). Phylogenetic inference 
of the masked alignment was based on maximum likelihood, using the JIT model 


with bootstrap support of 100 replicates as implemented in Mega 5.2.2 (ref. 36). 
Sequences were deposited in GenBank under accession numbers KM033232- 
KM033280. 

Co-culture experiments. Because we observed long term (>18 months) growth 
rate differences between axenic and non-axenic P. multiseries, all co-culture 
experiments (including the transcriptome and metabolite analyses experiments) 
were conducted within 7 months of curing the diatom of bacteria. All experiments 
were conducted in Aquil’’. Bacteria were plated freshly before each experiment on 
marine agar and were grown from single colonies in marine broth overnight 
(30°C, 150 r.p.m.). Cells were centrifuged (3,500g for 5 min), washed twice with 
sterile Aquil, and diluted to a stock cell density of ~1 X 10° cells per millilitre with 
sterile Aquil. This stock was used to inoculate the freshly prepared diatom culture 
to achieve a final bacterial cell density of ~1 X 10°-2 X 10° cells per millilitre. 
Diatoms were inoculated from an early to mid-exponential phase growing culture 
into fresh media to an initial diatom cell density of ~2,000-4,000 cells per millilitre 
to achieve an ~50:1 bacteria:diatom ratio. Diatom and bacterial growth were 
measured as described above. For experiments where Sulfitobacter sp. strain 
SA11 was grown alone, Aquil was supplemented with 11 mM glucose as the sole 
carbon source except for the transcriptome experiment (see below), where only 1 
uM glucose was used. 

For the transcriptome experiments, axenic P. multiseries strain PC9 was used 
to inoculate 2 | sterile polycarbonate bottles. Treatments consisted of (1) PC9 
and SA11 co-culture, (2) axenic PC9, and (3) SA11 supplemented with 1 14M 
glucose. All treatments were in triplicate. Growth rates for PC9 were [axenic = 
0.75 + 0.03 d-? and pco-cuture = 0.94 + 0.04 d~'. Growth rates for SA11 were 
glucose = 0.45 + 0.01 dt and plco-cutture = 0.46 + 0.02 d~ 1. Cells were harvested at 
mid-exponential growth (96 h after inoculation for all treatments) by filtering the 
culture through a 3 jum pore-size polycarbonate filter to capture the majority 
of diatom cells (this step was skipped for treatment 3), followed by filtration 
through 0.22 um pore-size polycarbonate filter to capture bacteria. Final cell 
densities at time of harvesting were PC9 xenic ~3.5 X 10° cells per millilitre, 
PC9 co-culture ~5.8 X 10* cells per millilitre, SA11.,-cutture ~1.7 10° cells per 
millilitre, and SAL] giucose ~3 X 10° cells per millilitre. Filters were immediately 
flash frozen in liquid nitrogen and later stored at —80°C. Flow-through media 
were used for targeted metabolite analyses (see below). 

IAA addition experiments. IAA (Sigma-Aldrich) was dissolved in Milli-Q water 
and sterilized using syringe filtration. P. multiseries strain GGA2 was inoculated 
into fresh Aquil media supplemented with different concentrations of IAA or 
equivalent volumes of Milli-Q water (control). For IAA additions in Extended 
Data Fig. 5, [AA or Milli-Q water was added every 2 days as described to simulate a 
continuum of active IAA concentrations in the media. 

P. multiseries genome. A publically available genome of strain CLN-47 was used 
for our analysis (http://genome.jgi-psf.org/Psemu1/Psemul.home.html). 

SA11 genome. All nucleic acid quantifications were measured using a Qubit 
Fluorometer (Invitrogen; Life Technologies). DNA was extracted using a 
Qiagen DNA Blood and Tissue kit according to the manufacturer’s instructions. 
Ten micrograms of DNA were sheared to 2-3 kb using a Hydroshear (Genomic 
Solutions) with a standard shearing assembly. To prepare the DNA for SOLiD 60 
base pair (bp) X 60 bp mate-pair sequencing, we used a unique protocol combin- 
ing different steps from the SOLiD 3, SOLiD 4, and SOLiD 5500 mate-paired 
protocols (R.M. & E.V.A., manuscript in preparation). The library was attached 
to beads by emulsion PCR, which was done at the Life Technologies Research and 
Development Unit. For sequencing, 50 million library-containing beads were 
deposited onto one spot of an eight-spot slide and run on a SOLiD 4 Next 
Generation Sequencer (Life Technologies). In addition to SOLiD sequencing, 
we used Ion Torrent to improve the genome assembly. DNA was prepared as 
described above. Ion Torrent library preparations were performed according to 
the manufacturer’s instructions. 

The SA11 genome was assembled using a combination of fragment Ion Torrent 
reads and 50 X 50 mate-paired SOLiD colorspace reads with an insert size of 2,200 
bp (s.d. 800 bp). Fastq_quality_trimmer from a FASTX Toolkit was used to quality 
trim and filter Ion Torrent reads with parameters ‘-Q33 -1 50 -t 14. SOLiD reads 
were screened for PCR duplication artefacts using fastq_nodup from SEAStAR” 
with parameters ‘-d 2-no_prefix’ and were trimmed and filtered on the basis of 
quality score using trimfastq from SEAStAR with parameters ‘-p.75 -] 34 -e 3.0- 
add_len-no_prefix’. Contigs were created with the de novo assembly tool Newbler. 
The contigs were organized into scaffolds using graph_ops from SEAStAR on the 
basis of mate-pair connections identified by a BWA alignment of SOLID reads to 
Ion Torrent contigs. Contigs were also created for a de novo assembly of SOLiD 
reads using Velvet with a kmer size of 31, coverage cutoff of 35, expected coverage 
of 200, insert size of 2,200, insert size standard deviation of 800, scaffolding 
disabled, and a minimum contig length of 100. These contigs were used to fill 
gaps between scaffolded Ion-Torrent-based Newbler contigs where possible. 
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The genome can be publicly accessed through IMG (http://img jgi.doe.gov; sub- 
mission 11682). 

PC9 transcriptome. Total RNA was isolated from 3-m filters using a TOTALLY 
RNA Total RNA Isolation Kit (Ambion; Life Technologies), and messenger RNA 
(mRNA) was purified from the total RNA using a MicroPoly(A)Purist Kit 
(Ambion; Life Technologies). SOLiD 75 bp X 35 bp paired-end libraries were 
generated from ~500 ng of mRNA from each replicate and treatment using a 
SOLiD Total RNA-Seq Kit (Life Technologies) with the gel option according to the 
manufacturer’s instructions. The libraries were attached to beads in-house by 
emulsion PCR according to the SOLiD manual. For sequencing, 700 million 
library-containing beads were deposited onto a full slide and run on a SOLiD 4 
Next Generation Sequencer (Life Technologies). 

The SOLiD reads were trimmed on the basis of quality score using custom in- 

house software*’. Trimmed reads with a length shorter than 28 colorspace transi- 
tions were removed. Filtered reads were then aligned to the P. multiseries draft 
genome gene catalogue transcripts provided by JGI (Psemul_GeneCatalog_ 
transcripts_20111011.nt.fasta) using BWA (version 0.5.8) allowing for two mis- 
matches in a seed length of 18, and up to four mismatches across an entire read. 
Anti-sense reads were removed and counts for SOLiD reads aligning to gene 
catalogue transcripts were calculated from the resulting SAM alignment file”® 
using SEAStAR”’. SEAStAR counts were then analysed with edgeR” for differ- 
ential expression and significance testing using Benjamini Hochberg multiple 
testing corrections. Count tables were then merged with KEGG annotations pro- 
vided by JGI. Using a false discovery rate cutoff of 0.05, 2,143 genes were differ- 
entially expressed in PC9 out of ~19,703 gene models. 
SA11 transcriptome. RNA was isolated from 0.22-1m filters using an RNeasy 
Mini Kit along with RNAprotect Bacteria Reagent (Qiagen) according to the 
manufacturer’s instructions using 1 mg ml lysozyme solution (Fisher) to lyse 
the cells. Total RNA was treated for DNA contamination using two successive 
treatments with Turbo-DNase (Ambion; Life Technologies) and cleaned/concen- 
trated with an RNeasy MinElute Cleanup Kit (Qiagen). Ribosomal RNAs (rRNAs) 
were selectively removed using a subtractive hybridization protocol” with bioti- 
nylated rRNA probes specific to the organism(s) in each sample (for example, 16S 
and 23S for SA11 mono- and co-cultures or eukaryotic 18S and 28S for co-cul- 
tures). Subtracting diatom rRNAs was essential as the first filtration through 3-1m 
pore-size filter (see above) did not completely remove all diatom cells. Probe- 
bound RNAs were removed with strepdavidin-coated magnetic beads (New 
England Biolabs). rRNA-depleted samples were then linearly amplified using a 
MessageAmp II-Bacteria kit (Ambion; Life Technologies). 

Amplified mRNA was then converted into complementary DNA (cDNA) using 
a SuperScript III First-Strand Synthesis System for RT-PCR kit (Ambion; Life 
Technologies) and a NEBNext mRNA Second Strand Synthesis Module (New 
England Biolabs) according to the manufacturer’s instructions. SOLiD 75 bp x 
35 bp paired-end libraries were generated from 1 jig of cDNA using a SOLiD 
Fragment Library Construction Kit according to the manufacturer’s instructions. 
The libraries were attached to beads in-house by emulsion PCR according to the 
SOLiD manual. For sequencing, 450 million library-containing beads were depos- 
ited onto three lanes of a flow cell and run on a SOLiD 5500 Next Generation 
Sequencer (Life Technologies). 

SOLiD reads were trimmed and filtered to remove low-quality or low informa- 
tion content sequence using trimfastq from SEAStAR with settings ‘trimfastq -z 
-s-add_len -1 30 -p.9 -e 3.0’. Trimmed reads were aligned to IMG-derived SA11 
reference genome using BWA with settings ‘-k 2 -n.001 -1 18 -t 8 -c for the aln 
subcommand and settings ‘-n 100’ for the samse subcommand. Resulting SAM 
files were processed with ref_select and graph_ops from SEAStAR to get per gene 
read counts. These counts were processed with the R package edgeR to identify 
SA11 genes with significant differential expression between the two conditions. 
Using a false discovery rate cutoff of 0.001 and a fold-change =2, 2,620 genes were 
differentially expressed in SA11 out of ~5,281 open reading frames. 

Culture media targeted metabolite analysis. Ammonium and nitrate concen- 
trations were analysed using an AAII autoanalyzer system (Technicon). To detect 
and quantify IAA and tryptophan, culture supernatants were acidified to pH ~3.5 
using concentrated formic acid (Fisher, Baker Analyzed). Supernatants were 
then passed through conditioned solid-phase extraction (SPE) HLB columns 
(Waters) at a flow rate of ~5 ml min! to bind organic molecules. The columns 
were washed thoroughly with Milli-Q water and eluted with methanol (ultra- 
performance liquid chromatography (UPLC) grade, Fisher) according to the 
manufacturer’s instructions. Eluted samples were dried under a stream of nitrogen 
gas and were then frozen at — 80 °C for later analysis. Before analysis, samples were 
dissolved in water. 

Mass spectrometry. Tryptophan and IAA were purchased (Sigma-Aldrich) as 
well as labelled standards for IAA (indole-2,4,5,6,7-ds-3-acetic acid) and trypto- 
phan (L-tryptophan-2,3,3-d3) (CDN Isotopes). To verify the identity and quantify 
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IAA and tryptophan in all standards and culture samples, an ultra-performance 
liquid chromatography-electrospray ionization-tandem mass spectrometry 
(UPLC-ESI-MS) method adapted from ref. 41 was used, with some changes. A 
quadrupole time-of-flight (Waters Xevo G2-S QTOF) mass spectrometer was used 
for verification of compound identity. For the quadrupole time-of-flight, a cone 
voltage of 2 V and collision energy of 28 V were used when running IAA, and a 
cone voltage of 32 V and collision energy of 20 V were used when running 
tryptophan. Compound quantitation was done using selected reaction monitoring 
(SRM) on a Waters Xevo TQ-S triple quadrupole mass spectrometer. The same 
UPLC method was used for all analyses. Both retention times and SRM transition 
masses were used to target compounds in environmental samples run only using 
the SRM method (Supplementary Information Table 4). The UPLC method used a 
Waters Acquity UPLC system equipped with a Waters Acquity UPLC BEH C18 
column (1.7 jtm, 2.1 mm X 50 mm) at 30 °C with the mobile phase consisting of 
0.1% formic acid in water (solvent A) and methanol (solvent B). A linear gradient 
with a flow rate of 0.2 ml min | was used from 0 to 7 min (5% B to 90% B), 
followed by 2 min at 90% B, and 3 min linear gradient back to 5% B to re- 
equilibrate the column. Both mass spectrometers were configured with positive 
ion ESI with source conditions as follows: capillary voltage 0.5 kV, source tem- 
perature 130°C, desolvation temperature 550 °C, cone gas flow at 1501 h-1, and 
desolvation gas at 1,000 1 hot 

Analysis of the retention time and the accurate mass of the molecular ion 
(176.0706) and fragment ion (130.0656) of an authentic standard of IAA con- 
firmed its presence in a monoculture of SA11 (Supplementary Information Table 
4). Fragmentation of IAA was induced by a program that ramped collision energy 
from 20 to 30 V. To quantify tryptophan and IAA in cultures and environmental 
samples, a positive ion mode SRM method was used that monitored the following 
transitions: 176.02 —> 130.07 (IAA) (Supplementary Information Table 4) and 
188.07 — 118.05 (tryptophan). For SRM, the same cone voltage and collision 
energy were used as the quadrupole time-of-flight. 

For quantification of [AA and tryptophan in culture media samples, percentage 
recovery during SPE was determined for each molecule in Aquil by acidifying and 
passing different batches of Aquil through SPE HLB columns and treating them 
the same way as culture media described above. Three batches were spiked with 
labelled IAA-ds and tryptophan-d3 before SPE treatment while the rest were 
spiked after SPE. Extracts were then dried, re-dissolved as described above, and 
analysed on a Xevo TQ-S triple quadrupole mass spectrometer. Percentage recov- 
ery was determined on the basis of the peak area of labelled compounds in treat- 
ments spiked before and after SPE (70 + 16% for IAA; 32 + 12% for tryptophan). 
Reported concentrations were corrected for percentage recovery during SPE. An 
additional batch of Aquil media was extracted by SPE and used to construct a 
standard curve for each molecule to determine the linear range of the detector 
(Riga = 0.999, Rip = 0.98). Samples were diluted such that they were in the range 
of the standard curve concentrations (0-100 nM for IAA and 0-25 nM for tryp- 
tophan) and quantified using isotope-labelled internal standards. Internal stand- 
ard spikes were also within the standard curve concentrations. Concentrations 
calculated from the standard curve and from the internal standards were generally 
similar. Reported concentrations are from the internal standard calculation. No 
IAA contamination was found in blank Aquil. Traces of (<1 pM) tryptophan were 
detected in Aquil but were significantly lower than all measured concentration in 
cultures. We attempted to detect taurine in the culture media but were not able to 
do so, presumably because of poor SPE recovery and inappropriate UPLC chro- 
matography column type (C18). 

SA11 IAA production rate. In co-culture, the amount of IAA detected is probably 
lower than what is produced by SA11 if we presume there was active removal of 
IAA from solution by the diatom. Because the cell density and the length of the 
exponential growth phase of SA11 differ in co-culture and monoculture, a direct 
comparison between IAA concentrations in both treatments is not informative. To 
calculate the production rate of IAA by SA11, three sets of triplicate cultures of 
Aquil supplemented with 11 mM glucose were inoculated with SA11. Each set of 
cultures was harvested after 2, 3, and 4 days of growth. Organic molecules were 
extracted from the media after removing cells, and IAA was quantified as 
described in the Mass Spectrometry section. IAA production per cell per day 
was calculated from the three sets of cultures that were harvested on days 2, 3, 
and 4. Using the known SA11 cell density in co-culture, we calculated the min- 
imum expected concentration of [AA in co-culture (540 pM). This concentration 
served as a lower limit on the expected in situ [AA concentration that would result 
from the measured production rate in monoculture since our transcriptome data 
showed that IAA biosynthesis increased in the co-culture. 

Environmental metatranscriptomics and targeted metabolite analysis. 
Samples were collected in May 2012 at stations 1 and 3 along Line P and in 
August 2013 at stations 1 (48.6965° N, 126.0387° W), 3 (48.8168° N, 128.6648° 
W), 8 (49.9872° N, 144.8077° W), 14 (27.3462° N, 152.6717° W), and 16 (22.7603° 
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N, 158.0003° W) for metatranscriptomes and targeted metabolite analysis, 
respectively. Seawater samples were collected from the surface using a conduc- 
tivity-temperature-depth (CTD) rosette equipped with Niskin bottles (20 1). 

For RNA, cells were collected by sequential filtration on a Nitex screen of 53 1m 
pore-size to remove large particles, 142 mm 2.0 1m pore-size polycarbonate filters 
(mostly eukaryotic), and 142 mm 0.2 jm pore-size Supor filters (mostly prokar- 
yotic). Results shown are from the combined eukaryotic and prokaryotic size 
fractions. Filters were flash frozen in liquid nitrogen and subsequently stored at 
—80°C until processing. 

RNA extraction and DNA removal were performed as previously described for 
environmental metatranscriptomics*” with the following modifications: lysis in 
10 ml of Ambion lysis buffer (AM8540G) + 0.5 ml each of 0.5 and 0.1 zirconia 
beads. rRNAs were selectively removed using a subtractive hybridization method” 
with biotinylated rRNA probes specific to the samples (that is, bacterial and 
archaeal 16S and 23S and eukaryotic 18S and 23S). Probe-bound RNAs were 
removed with strepdavidin-coated magnetic beads (New England Biolabs). 
rRNA-depleted samples were then linearly amplified using a MessageAmp II- 
Bacteria kit (Ambion; Life Technologies). Amplified mRNAs were then converted 
into cDNAs for Illumina sequencing using a Superscript III First-Strand Synthesis 
System (Invitrogen; Life Technologies) followed by the NEBNext mRNA Second 
Strand Synthesis Module (New England Biolabs). cDNAs were then purified using 
a QIAquick PCR purification kit (Qiagen) followed by ethanol precipitation. 
Purified cDNAs were sheared to ~200-250 bp fragments and HiSeq libraries 
(Illumina) were constructed for paired-end (2 X 150) sequencing using an 
Illumina HiSeq 2500 platform. After sequencing, paired-end Illumina reads were 
joined using a PANDAseq assembler“*, and paired reads were trimmed using 
FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). 

For the metatranscriptome analyses, rhizobial IAA biosynthesis proteins with 
experimentally verified functions were identified and used to identify homologues 
in SA11 and from Roseobase (http://www.roseobase.org) using BLASTp. Proteins 
with no homologues in Roseobase or SA11 were not included in the analysis 
(Extended Data Fig. 6). In addition, indole-3-acetaldehyde (IAAld) dehydrogen- 
ase, a protein commonly annotated as aldehyde dehydrogenase in rhizobia, was 
not included in the analysis because of the presence of several homologues within 
each Roseobacter genome that are probably involved in other pathways besides 
IAA biosynthesis. Therefore, our analysis in Fig. 3b probably represents an under- 
representation of IAA biosynthesis transcripts in the North Pacific. On average, 
each station had an estimated 3.2 X 10"! transcripts per litre based on the recovery 
of the internal standard reads after sequencing, suggesting [AA transcripts recov- 
ered in our analyses represent ~0.01% of the total transcripts. The Roseobacter 
reference sequences identified above were used as the query for tBLASTn searches 
to identify transcripts representing Roseobacter-clade IAA biosynthesis genes in 
our North Pacific metatranscriptomes and three publicly available metatranscrip- 
tomes from the North Pacific Gyre, Monterey Bay (California), and the California 
Coastal system (NCBI accession numbers PRJNA244754, PRJNA183166, and 
PRJNA268385, respectively). Only reads with 260% sequence identity and 
=140 bp of the read length aligning to the query were included in our final analysis. 
Transcript concentrations in seawater for the North Pacific metatranscriptomes 


were calculated on the basis of the recovery of the internal standard reads. 
Percentage IAA biosynthesis transcription was calculated by dividing the number 
of reads from each pathway by the total. The lack of complete metadata for the 
public data sets prevented the calculation of accurate transcripts per litre. 

For targeted metabolite analysis, samples were collected and treated as 
described in ref. 45. Standard curves for IAA and percentage recovery were deter- 
mined as described for Aquil, except seawater from station ALOHA (station 16) 
was used as matrix. Because seawater samples were mainly collected and processed 
for targeted vitamin B detection®, recovery of IAA was poor but relatively con- 
sistent (32 + 7%). Concentrations detected were corrected using this percentage 
recovery. Detection and quantification were conducted as described in the Mass 
Spectrometry section. 
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Extended Data Figure 1 | Phylogeny of P. multiseries-associated bacteria. 
Maximum likelihood tree showing the 16S rRNA phylogeny of all bacterial 
strains cultivated from P. multiseries isolates. Colour of bacterial strain 
designation indicates which isolate of P. multiseries a bacterial strain originated 
from: red, PC9; blue, PnCLNN-17; green, PC4; magenta, GGA2 (see Extended 
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Alcanivorax sp. JC109 
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Data Table 1). Genera/clades that were considered to be associated with 

P. multiseries (contained two or more isolates from different diatom cultures 
with >99% 16S rRNA identity) are highlighted in grey. Bootstrap values greater 
than 50 are indicated at the branch points. Detailed information about each 
isolate is provided in Supplementary Information Table 1. 
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Extended Data Figure 2 | Effect of select bacterial strains on growth of 
P. multiseries. a, Growth of P. multiseries PC9 in the presence of different 
representative bacteria from its consortium (open circles) relative to axenic 
growth (filled circles). Bacterial representatives (Limnobacter, SA37; 
Marinobacter, SA14; Croceibacter, SA60; Sulfitobacter, SA52; see Extended 


Data Table 2) were inoculated at ~1 X 10° cells per millilitre relative to 
~4,000 cells per millilitre axenic PC9. Error bars, s.d. from triplicate cultures. 
b, Growth of P. multiseries IOES-1 in axenic culture or with SA11. Error bars, 
s.d. from four replicates. 
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Extended Data Figure 3 | Select metabolite analyses from the in the diatom monoculture and co-culture were 448 + 106 pM and 
P. multiseries—Sulfitobacter sp. SA11 co-culture and the environment. 202 + 20 pM, respectively. c, UPLC-ESI-MS/MS chromatograms of IAA from 
a, Dissolved ammonium concentrations in a medium blank, in axenic surface water at station 1, SA11, and co-culture (with PC9) supernatants. [AA 
P. multiseries PC9, and in PC9 with SA11 (co-culture). Error bars, the range was detected in positive ion mode by SRM from m/z 176 to 130. A 0.5 pM 
from duplicate supernatants. b, UPLC-ESI-MS/MS chromatograms of IAA standard is shown for retention time comparison. IAA concentrations in 
tryptophan in axenic PC9 or co-culture supernatants. Tryptophan was detected __ the co-culture and SA11 monoculture were 6.1 + 0.4 pM and 540 + 280 pM, 
in positive ion mode by SRM from m/z 188 to 118. A 500 pM tryptophan respectively. 


standard is shown for retention time comparison. Tryptophan concentrations 
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Extended Data Figure 4 | Effect of multiple exogenous IAA additions on P. multiseries GGA2. Axenic GGA2 was grown in synthetic seawater media and 
50 nM IAA was added at times indicated by the red arrows. Error bars, s.d. from six cultures. 
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Extended Data Figure 5 | Map of stations in the North Pacific Ocean where _ exhibit dramatic differences in chemical and physical characteristics. For 
seawater samples were collected. Surface and chlorophyll maximum waters example, stations 1 and 3 are nutrient-rich coastal sites, station 8 is iron-limited, 
were collected for targeted metabolite analysis (all stations indicated) and and stations 14 and 16 reside within the North Pacific Gyre and are 
metatranscriptomics (stations 1 and 3). Station 8 coincides with historic station _ oligotrophic. The map was created with Esri ArcGIS and Esri ArcMap 10.1 
PAPA and station 16 coincides with station ALOHA. The different stations software. 
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Extended Data Figure 6 | IAA biosynthesis pathways in bacteria examined 
in the North Pacific Ocean metatranscriptomes. IAA biosynthesis in bacteria 
is divided into tryptophan-dependent and -independent pathways. Known 

bacterial enzymes involved in IAA biosynthesis all belong to the former (italic 
names). Dotted arrows represent biosynthetic steps with no known enzymes in 
bacteria'*. Enzyme names are coloured according to the different pathways 

present in Roseobacter genomes: green, IAN pathway; red, IAM pathway; cyan, 
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TAM pathway. Grey enzyme names were not included in our analysis because 
either no homologues were found in Roseobacter genomes or, in the case of 
IAAld dehydrogenase (belonging to the aldehyde dehydrogenase family), the 
presence of multiple homologues within a given genome that were involved in 
multiple pathways not related to IAA biosynthesis prevented our ability to 
decide on a reliable query for blast analysis. [AAld, indole-3-acetaldehyde; IPy, 
indole-3-pyruvate. This figure was modified from ref. 18. 
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Extended Data Table 1 | Diatom species and isolates used in this study 
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Species Strain/Isolate Origin of Isolation Isolation 

name Date 

Pseudo-nitzschia multiseries 
PnCLNN-17' Bay of Fundy, Canada 2007 
Pc9 Penn Cove, WA 2010 
PC4 Penn Cove, WA 2010 
GGA2 Golden Gardens, WA 2010 
IOES-1 East Sound, WA 2010 

Thalassiosira pseudonana 

CCMP 1335 Long Island, NY 1958 


* Strains from which most bacteria were isolated. 
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Extended Data Table 2 | Specific growth rate promotion of P. multiseries isolate PC9 in co-culture with different bacteria 


Bacterial genus Isolate Bane = 6.0. Hecetnee S0. % change in yu 
Name 
Sulfitobacter 
SA11 0.69+0.03 1.06+0.05 35 
SA30 0.59+0.02 0.84+0.06 30 
SA44 0.40+0.01 0.49+0.01 18 
SA52 0.46+0.02 0.62+0.01 26 
Phaeobacter 
GS35 0.46+0.02 0.52+0.05 11 
GS36 0.46+0.02 0.49+0.01 6 
Limnobacter 
SA23 0.64+0.02 0.66+0.08 3 
SA37 0.59+0.02 0.54+0.09 -8 
Marinobacter 
SA14 0.64+0.1 0.70+0.06 8 
Croceibacter 
SA60 0.69+0.03 ’ § 


¥ Growth rate could not be calculated, as the bacterium was algicidal. 


Standard deviation values were calculated from biological triplicates. 
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Extended Data Table 3 | Specific growth rate promotion of different diatoms in co-culture with Sulfitobacter sp. SA11 


Species Culture name Maxenic + 8.0. Ucoculture+ 8.d. % change in ju 
Pseudo-nitzschia 
multiseries PC9" 0.69+0.03 1.06+0.05 35 
0.59+40.02 0.87+40.03 a2 
0.70+0.01 0.95+0.05 26 
0.75+0.03 0.94+0.04 20 
0.72+0.01 0.89+0.02 19 
GGA2 0.53+0.01 0.70+0.02 24 
PC4 0.47+40.02 0.48+0.01 2, 
IOES-1 0.55+0.03 0.5840.02 5 
Thalassiosira 
pseudonana 
CCMP 1335 0.98+0.00 0.98+0.01 0 


Growth rate change ranged from 19-35% over five separate experiments. 


Standard deviation values were calculated from biological triplicates except for IOES-1 (n = 4). 
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Extended Data Table 4 | The effect of single IAA additions on the growth of P. multiseries GGA2 


Standard error was calculated from n = 6 cultures. 


IAA concentration Mean Growth 
rate + s.e. 

0 nM 0.53 + 0.01 
1 nM 0.53 + 0.01 
50 nM 0.58 + 0.01* 
100 nM 0.58 + 0.01* 
250 nM 0.52 + 0.02 
10 uM ; 


* Indicates statistically significant growth rate 
enhancement relative to 0 nM IAA. 

* Growth rate could not be calculated, as this 
concentration was inhibitory. 
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The hypoxic cancer secretome induces 
pre-metastatic bone lesions through lysyl oxidase 


Thomas R. Cox", Robin M. H. Rumney’, Erwin M. Schoof*, Lara Perryman’, Anette M. Hoye’, Ankita Agrawal’, 
Demelza Bird’, Norain Ab Latif?, Hamish Forrest’, Holly R. Evans’, Iain D. Huggins’, Georgina Lang’, 


Rune Linding'*, Alison Gartland** & Janine T. Erler!?* 


Tumour metastasis is a complex process involving reciprocal inter- 
play between cancer cells and host stroma at both primary and sec- 
ondary sites, and is strongly influenced by microenvironmental 
factors such as hypoxia’. Tumour-secreted proteins play a crucial role 
in these interactions”* and present strategic therapeutic potential. 
Metastasis of breast cancer to the bone affects approximately 85% 
of patients with advanced disease and renders them largely untreat- 
able®. Specifically, osteolytic bone lesions, where bone is destroyed, 
lead to debilitating skeletal complications and increased patient mor- 
bidity and mortality®’. The molecular interactions governing the 
early events of osteolytic lesion formation are currently unclear. 
Here we show hypoxia to be specifically associated with bone relapse 
in patients with oestrogen-receptor negative breast cancer. Global 
quantitative analysis of the hypoxic secretome identified lysyl oxidase 
(LOX) as significantly associated with bone-tropism and relapse. 
High expression of LOX in primary breast tumours or systemic deliv- 
ery of LOX leads to osteolytic lesion formation whereas silencing or 
inhibition of LOX activity abrogates tumour-driven osteolytic lesion 
formation. We identify LOX as a novel regulator of NFATc1-driven 
osteoclastogenesis, independent of RANK ligand, which disrupts nor- 
mal bone homeostasis leading to the formation of focal pre-metastatic 
lesions. We show that these lesions subsequently provide a platform 
for circulating tumour cells to colonize and form bone metastases. 
Our study identifies a novel mechanism of regulation of bone home- 
ostasis and metastasis, opening up opportunities for novel thera- 
peutic intervention with important clinical implications. 

Using a primary tumour hypoxic signature’®, retrospective analysis 
ofa cohort oflymph-node-negative breast cancer patients who received 
no systemic adjuvant therapy” revealed a significant association with 
metastasis specifically within oestrogen receptor (ER)-negative (ER ) 
but not ER-positive (ER*) patients (Fig. la). Moreover, analysis of 
metastatic site showed significant association with bone metastases 
over lung, liver and brain (Fig. 1b and Extended Data Fig. 1a). We 
performed global differential quantitative mass-spectrometry-based 
proteomic analysis of the hypoxic secretome associated with osteo- 
tropism using the human ER’ MDA-MB-231 parent and matched 
bone tropic (clone 1833) (MDA-BT)!° breast cancer cell lines 
(Fig. 1c). LOX was one of the most highly upregulated secreted proteins 
in MDA-BT cells (Fig. 1d, Supplementary Information 1 and Extended 
Data Fig. 1b-d). Querying the publically available data sets for the full 
panel of MDA-MB-231 clonal lines’®, which exhibit differing levels of 
osteotropism, we found LOX was significantly associated with increas- 
ing osteotropism (Fig. le). LOX has previously been strongly impli- 
cated in cancer metastasis”"’~”’, identifying it as an important candidate 
for further investigation. 

Retrospective analysis of LOX in our patient cohort’ confirmed it is 
significantly associated with metastasis in ER” patients but not ER* 


breast cancer patients (Extended Data Fig. 2a, b), consistent with pre- 
vious reports'’. Furthermore LOX is significantly associated with 
reported bone relapse across all patients and ER, but not ER“, patients 
(Fig. 1f). Cox-regression analysis showed LOX is associated with 
increased hazard ratio in ER’ patients for metastasis in general and 
bone relapse (Extended Data Fig. 2c). Receiver operating characteristic 
(ROC) analysis showed LOX is indicative of metastatic dissemination 
(including to the bone) in ER” but not ER* breast cancer (Extended 
Data Fig. 3a). Importantly, our observations were confirmed in a sec- 
ond data set'* (Extended Data Fig. 3b). Our findings strongly implicate 
LOX in bone metastases in ER” breast cancer patients. 

We further investigated our findings in the immune-competent 
4T1-BALB/c syngeneic model of spontaneously metastasizing ER — 
breast cancer which expresses high levels of LOX’? (Extended Data 
Fig. 4a). The MDA-MB-231 lines are not suitable as a progression model 
since bone metastases do not occur from orthotopic implantation. 
Micro-computed tomography (micro-CT) analysis of bones from 4T1 
tumour-bearing mice showed decreased trabecular and cortical bone 
volume, trabecular number and trabecular thickness, and increased focal 
osteolytic lesions over time (Fig. 2a—-c and Extended Data Fig. 4b-e). 
Significant changes were detectable from 2 weeks after implantation 
when tumour hypoxia is a salient feature (Extended Data Fig. 4f). 
Bone marrow explants and quantitative PCR with reverse transcription 
(qRT-PCR) confirmed osteolytic lesion formation preceded the arrival 
of tumour cells (Extended Data Fig. 4g-i). Strikingly, osteolytic lesion 
formation and cortical bone loss were also induced in a tumour-free 
model through injection of hypoxic tumour-conditioned media (‘CM’) 
(Fig. 2b, c)’. Our data show early osteolytic lesions are formed in the 
absence of tumour cells by hypoxia-induced tumour-secreted factors. 

To determine LOX-dependency, mice were implanted with 
4T1shLOX tumours, with decreased LOX expression and decreased 
LOX in sera (Extended Data Figs 4a and 5a). Micro-CT analysis revealed 
decreased osteolytic lesions in these mice (Fig. 2d, e), with no effect on 
primary tumour growth (Extended Data Fig. 5b). Immunological 
inhibition of LOX in 4T1 scrambled control (4T1scr) tumour-bearing 
mice with our antibody that binds specifically to LOX” and blocks 
enzymatic function’’, also decreased focal osteolytic lesion formation 
(Fig. 2f). Consistently, in tumour-free models, injection of 4T1shLOX 
CM generated fewer focal osteolytic lesions than 4T1scr CM (Fig. 2g). 

We confirmed our findings in another, previously published human 
colorectal cancer model with manipulated LOX expression’*. SW480 is 
a non-metastatic colorectal cancer cell line with low LOX expression, 
whose metastatic ability is increased by overexpression of wild- 
type LOX (+LOX) but not a catalytically inactive mutant (K320A) 
(+mutLOX)'*. SW480+LOX CM _ injection showed increased 
frequency and size of osteolytic lesions compared with SW480+ 
mutLOX or SW480+EV CMs (Fig. 2h and Extended Data Fig. 5c). 


1Biotech Research and Innovation Centre (BRIC), University of Copenhagen (UCPH), Copenhagen, DK-2200, Denmark. “Hypoxia and Metastasis Team, Cancer Research UK Tumour Cell Signalling Unit, The 
Institute of Cancer Research, London SW3 6JB, UK. *The Mellanby Centre for Bone Research, The University of Sheffield, Sheffield S10 2RX, UK. “Cellular Signal Integration Group (C-SIG), Technical 


University of Denmark (DTU), Lyngby, DK-2800, Denmark. 
*These authors contributed equally to this work. 
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Figure 1 | Tumour-secreted LOX is a critical 
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Figure 2 | Osteolytic lesion formation in ER” mammary carcinoma models 
is LOX dependent. a, Representative two-dimensional cross-sections of tibia 
from control (top) and tumour-bearing (bottom) mice 3 weeks after orthotopic 
implantation showing lesions (arrowheads) and loss of trabecular structure 
(asterisks). b, Micro-CT analysis of osteolytic lesions in tumour-bearing and 
tumour-free, CM-conditioned mice at 3 weeks (n: mice; control 3; 4T1scr 
tumour 5; control injected 5; 4T1scr CM 5) ¢, Loss of cortical bone volume in 
4T Iscr tumour-bearing and tumour-free CM-injected models at 3 weeks 

(n: mice; control 4; 4T1scr tumour 3; control injected 4; 4T 1scr CM 6). 

d, Representative three-dimensional reconstructions of tibiae showing tumour- 
driven osteolytic lesions (arrowheads). e, LOX silencing decreases focal 


osteolytic lesion formation (n: mice; control 5; 4T1scr tumour 8; 4T1shLOX 
tumour 5). f, LOX inhibition decreases osteolytic lesion formation in tumour- 
bearing models (1: mice; control 5; 4T 1scr tumour + immunoglobulin G (IgG) 
13; 4T 1scr tumour + LOX Ab 14) and g, in tumour-free CM injection models 
(n: mice; control 5; 4T1scr CM 5; 4T1shLOX CM 5). h, SW480 human 
colorectal cancer lines with stably manipulated LOX expression (EV, +LOX or 
+mutLOX) confirms LOX-dependency (n = 8 mice per condition). 

i, Exogenous recombinant LOX (rLOX) drives osteolytic lesion formation in 
nude and BALB/c models (n: mice; nude control 8; nude rLOX 8; BALB/c 
control 7; BALB/c rLOX 6). b, c, e-i, Data are mean + s.e.m. *P < 0.05, 

**P < 0.01, ***P < 0.001, unpaired parametric one-tailed t-tests. 
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Figure 3 | Tumour-secreted LOX modulates osteoclasts and osteoblasts in 
vitro and in vivo. a, rLOX (in the absence of RANKL) stimulates 
osteoclastogenesis. b, c, rLOX generated osteoclasts exhibit high resorptive 
ability; arrow, osteoclast; arrowheads, resorption tracks (a—c) (n: osteoclast count 
from 12 independent osteoclast assays per group). d, rLOX induces nuclear 
localization of the master transcription factor NFATcl in the absence of RANKL 
(n: osteoclast NFAT c1 nuclear intensity; control 12; +RANKL 196, +rLOX 191 
across three independent experimental repeats (donors) from 16 fields of view 
per donor). e, LOX antibody treatment blocks NFATc1 localization. f, Catalase 
treatment blocks rLOX-induced nuclear localization of NFATc] (e, f, represent 
data from 32 measurements of NFATc1 nuclear intensity from each of three 
independent donors (96 total)). g, rLOX added to primary mouse calvarial 
osteoblasts increases mineralization ability (DEX, dexamethasone) (n: alizarin 


While bone metastases in patients with colorectal cancer are rare, our 
data show that high levels of secreted active LOX drive focal osteolytic 
lesion formation in the bone independently of tumour presence across 
multiple cancer types. Injection of recombinant LOX (rLOX) into both 
immune-compromised nude and immune-competent BALB/c mice 
also led to the formation of focal osteolytic lesions (Fig. 2i) and 
increased circulating carboxy terminal telopeptide (CTX), a biomarker 
of bone turnover (Extended Data Fig. 5d). Our data clearly dem- 
onstrate tumour-secreted LOX as a mediator of osteolytic lesions. 
Bone homeostasis is a balance between bone resorption by osteo- 
clasts and bone formation by osteoblasts. This balance is typically 
disrupted in cancer metastasis. Addition of rLOX (in the absence of 
RANK ligand (RANKL)) to pre-osteoclast cultures was a highly effec- 
tive stimulator of osteoclastogenesis, generating greater numbers of 
osteoclasts (Fig. 3a) with a higher resorptive capacity than RANKL- 
stimulated cultures (Fig. 3b, c). Enzyme-linked immunosorbent assays 
(ELISAs) for RANKL in rLOX-treated culture supernatants showed 
no RANKL, ruling out autocrine production of RANKL, and mass 
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red S intensity; 16 wells per group across two independent experimental repeats). 
h, i, 4T1scr mammary tumours decrease osteoblast number compared with 
shLOX (n: mice; control 10; 4T1scr tumour 7; 4T1 shLOX tumour 9) (h) and 
increase osteoclast number on the endocortical surface of bone (per millimetre 
bone perimeter) (n: mice; control 10; 4T1scr tumour 8; 4T1 shLOX tumour 9) 
(i). j, Representative images of osteoblasts (arrows) and osteoclasts (star) in 
sections of bone. k, 4T 1scr tumour-bearing mice treated with the anti-LOX 
antibody show similar effects on osteoblast number and 1, osteoclast number 
to shLOX tumour-bearing mice (n: mice; 4T 1scr tumour + IgG 5; 4T1scr 
tumour + LOX Ab 7). m, Representative sections of bone from 4T1scr 
tumour-bearing mice with or without anti-LOX antibody or IgG control). 

a, c-i, k, 1, Data are mean + s.e.m. *P < 0.05, **P < 0.01, ***P < 0.001, 
unpaired parametric two-tailed t-test. 


spectrometry analysis of rLOX preparations excluded the presence 
of contaminating effectors (Extended Data Fig. 6a, b). Our data show 
LOX can stimulate the generation of fully differentiated, active osteo- 
clasts independently of RANKL. 

Osteoclastogenesis is driven by the nuclear translocation of 
NFATcl, the master regulator of osteoclastogenesis'®. Addition of 
rLOX (in the absence of RANKL) induced greater nuclear localization 
of NFATc1 than RANKL (Fig. 3d and Extended Data Fig. 6c), which 
was disrupted by treatment with our LOX-targeting antibody in a 
dose-dependent manner (Fig. 3e and Extended Data Fig. 6c). A by- 
product of LOX activity is the reactive oxygen species hydrogen per- 
oxide (H2O2). Reactive oxygen species have previously been suggested 
to influence osteoclast differentiation and function’”’*. Treatment of 
human pre-osteoclast cultures with rLOX in the presence of catalase 
(which rapidly degrades HO.) abrogated rLOX-driven NFATc1 nuc- 
lear localization in a dose-dependent manner (Fig. 3f). Our data iden- 
tifies a novel, LOX-activity-dependent mechanism of de novo 
osteoclastogenesis which occurs independently of RANKL. Addition 
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Figure 4 | LOX-mediated lesions are osteoclast-driven and enhance 
circulating tumour cell colonization. a, Representative three-dimensional 
reconstructions of tibiae from tumour-bearing mice with or without 
bisphosphonate treatment. b, Tibial bone loss is abrogated in tumour-bearing 
mice treated with bisphosphonate (n: mice; control 5; 4T 1scr tumour 4; 4T Iscr 
tumour + bisphosphonate 4). c, Similar effects are observed in CM- 
conditioned models treated with bisphosphonate (n = 5 mice all groups). 

d, e, Quantification (d) and representative whole-body IVIS imaging (e) of 
intracardially injected 4T1Luc tumour cells after conditioning with 4T1scr or 
4T1shLOX CM. (n: mice; 4T1scr CM+IgG 8; 4T 1scr CM+LOXAb 8; 


of rLOX to primary calvarial mouse osteoblasts decreased proliferation 
and led to an increase in terminal differentiation, which was attenuated 
by our LOX blocking antibody (Fig. 3g and Extended Data Fig. 6d). 
Similarly, high LOX 4T1scr CM decreased proliferation and increased 
differentiation of the human osteoblast SaOS-2 cell line (Extended 
Data Fig. 6e, f), which was attenuated by treatment with our LOX 
antibody. Our data show LOX leads to a loss of proliferative phenotype 
and increased terminal differentiation of osteoblasts. 

Consistent with LOX tipping the balance of bone homeostasis in the 
favour of osteoclast resorption, quantification of osteoblasts and osteo- 
clasts on the endocortical surface of tibiae from tumour-bearing mice 
showed decreased osteoblast and increased osteoclast number in 
4T lscr tumour-bearing mice (Fig. 3h-j). Partial reversion was evident 
in mice treated with our LOX antibody and in mice bearing 4T lshLOX 
tumours (Fig. 3h-m and Extended Data Fig. 7a). Thus, tumour- 
secreted LOX is an important modulator of bone homeostasis. 
Treatment of tumour-bearing and CM-injected mice with clinically 
relevant concentrations of the bisphosphonate zoledronic acid abro- 
gated focal osteolytic lesion formation (Fig. 4a—c) without affecting 
primary tumour growth (Extended Data Fig. 7b). Our data highlight 
the potential for therapeutic intervention of LOX-mediated osteoclast- 
driven pre-metastatic lesion formation in the bone. 

The functional consequence of LOX-mediated pre-metastatic focal 
osteolytic lesion formation was tested by pre-conditioning BALB/c 
mice with either 4Tlscr CM with or without LOX antibody, or 


4T1shLOX CM+]gG 10). f, Micro-CT lesion analysis of mice after intracardiac 
injection following pre-conditioning (n: mice; 4T1scr CM+IgG 6; 4T Iscr 
CM+LOXAb 8; 4T1shLOX CM-+IgG 8). g, Representative whole-body IVIS 
imaging of 4T1Luc tumour cells at 1 and 5 weeks after intracardiac injection. 
Mice were conditioned with hypoxic 4T1scr CM with and without 
simultaneous treatment with bisphosphonate. White boxes in e and g denote, 
tumour burden analysis region-of-interest. h, log, quantification of g (n = 5 
mice all groups). i, Schematic of LOX-mediated effects on bone homeostasis in 
vivo. b-d, f, h, Data are mean + s.e.m. *P < 0.05, **P < 0.01, ***P < 0.001, 
unpaired parametric two-tailed t-test. 


4T1shLOX CM to stimulate focal osteolytic lesion formation. 4T1 
luciferase-expressing tumour cells (4T1Luc) were then injected intra- 
cardially. Bioluminescent IVIS imaging and micro-CT analysis 
revealed increased tumour burden in 4T1scr CM-conditioned mice 
compared with LOX antibody treated or 4T1shLOX CM-conditioned 
mice (Fig. 4d-f). A Pearson correlation coefficient showed a positive 
correlation between IVIS signal and lesion number (Extended Data 
Fig. 7c). Our data demonstrate that LOX-mediated pre-metastatic 
focal osteolytic lesions generate niches within the bone microenviron- 
ment that support colonization of circulating tumour cells and the 
formation of overt bone metastases. Treatment with bisphosphonate 
during 4T1scr CM conditioning of mice significantly reduced the 
ability of intracardially injected 4T1Luc cells to colonize the bone 1 
week after injection and to develop bone metastases 5 weeks after 
injection (Fig. 4g, h). Thus, bisphosphonate treatment of patients with 
high-LOX-expressing tumours after surgery could prevent the estab- 
lishment and growth of circulating tumour cells within the bone. 
Pre-metastatic preparation of secondary sites to facilitate sub- 
sequent tumour cell colonization has been reported by us and 
others**’” in several organs across multiple cancers; however, so far, 
the formation of pre-metastatic focal osteolytic lesions directly by 
tumour-secreted factors has not been described. We are the first to 
demonstrate, to our knowledge, that LOX activity modulates bone 
homeostasis, acting directly on osteoblasts and osteoclasts (Fig. 4i). 
Global bone loss before tumour cell arrival has been previously linked 
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to indirect effects of tumour-secreted factors through RANKL- 
dependent mechanisms”. We present a novel mechanism of dereg- 
ulation of bone homeostasis independent of RANKL, yet acknowledge 
that other factors probably contribute to this inherently complex pro- 
cess. Interestingly, the post-translationally cleaved LOX propeptide, 
known for its opposing inhibitory effects to that of the mature 
LOX enzyme”, has been reported to modulate osteoblast behaviour 
through an intracellular mechanism inhibiting mineralization 
ability. Yet embryonic day (E)18.5 Lox '~ mice exhibit markedly 
decreased mineral nodule formation and osteoblast differentiation’, 
supporting our data and suggesting context-dependent effects. LOX is 
reportedly expressed in osteoblasts and induced by TGF-B released 
during bone resorption”, which could further stimulate osteoclasto- 
genesis leading to unbalanced coupling of bone homeostasis and focal 
osteolytic lesions. Since LOX-dependent pre-metastatic lesions form 
in the bone at the same time as those formed in the lungs and liver’, we 
believe that these osteolytic lesions are independent, forming simulta- 
neously through unrelated mechanisms. 

Our data suggest LOX may well be a useful marker for predicting 
the likelihood of metastases to the bone in ER breast cancer 
patients and identifying these patients for adjuvant bisphosphonate 
treatment. Our data highlight that the dosing and administration of 
LOX inhibitors under development will be critical as genetic 
targeting yielded more potent effects than our antibody in this study, 
and earlier work has shown that treatment with the non-specific LOX 
inhibitor B-aminopropionitrile can reduce bone colonization in intra- 
cardiac models when administered at the time of inoculation’. In 
summary, our insight into the very early mechanisms of bone meta- 
stases before, and independent of tumour cell arrival, identifies a 
new step in bone metastasis and novel opportunities for therapeutic 
intervention. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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For in vivo experiments, sample size was estimated to be eight mice per treatment 
group to ensure more than 80% power with 95% confidence, based on 25% 
practical difference and 15% coefficient of variation. 

Patient data analysis. Evaluation of the expression of a previously published 
hypoxic signature® and LOX with respect to metastasis and organ specific relapse 
was conducted using a published cohort of 344 primary breast cancers from 
lymph-node-negative patients who had not received systemic adjuvant therapy 
and with available gene expression data and site of relapse information. Details on 
patients and gene expression analysis can be found in ref. 9. P values were derived 
from a Mann-Whitney test and were two-tailed. An additional Kruskal-Wallis 
test between reported bone relapse, relapse elsewhere and no relapse patients 
with an additional contrast test wherein all pairwise groups were considered 
was conducted for LOX expression. Cox-regression using log,(LOX expression 
data) was used to estimate the hazard ratio in two analyses. One analysis used 
the no-relapse patients and the bone relapse patients, and the second analysis 
included all patients. An alternative second patient data set’* reporting data on 
295 lymph-node-negative patients who did not receive adjuvant therapy, with 
available site of relapse, was used to confirm our LOX-based findings. 

In vitro culturing of tumour cells. Unless stated otherwise in the following 
sections, all cell lines were routinely cultured in DMEM with 100 U ml’ penicillin 
and 100 jig ml~' streptomycin, plus 10% FBS. For CM collection experiments, 
cells were transferred to serum-free DMEM without phenol red and incubated at 
either 21% oxygen (normoxia) or 1% oxygen (hypoxia) for 24 h in a Hypoxystation 
(Don Whitley Scientific). All CMs were filtered before use. For SILAC for mass 
spectrometry studies, tumour cells were grown in DMEM containing labelled 
isotopic amino acids, either light isotope (?C-,4N-arginine; 2c.,MN-lysine) 
(RO/KO) or heavy isotope (9C-,!°N-arginine; 8¢.,!°N-lysine) (R10/K8) for 
five passages before incorporation was assessed. A minimum of 97-98% labelled 
arginine and lysine incorporation, with less than 1% proline conversion, was 
required for subsequent proteomics studies. The MDA-MB-231 BT cell line was 
obtained from J. Massagué at the Memorial Sloan-Kettering Cancer Center. The 
4T1 wild-type cell line was obtained from F. Miller at the University of Michigan. 
The SaOS-2 cell line was obtained from J. Gallagher at Liverpool University. The 
MDA-MB-231 parental cell line was obtained from the American Type Culture 
Collection (ATCC) (distributed by LGC Standards), where cell lines are authenti- 
cated on a regular basis. The 4T1Luc line was from SibTech. The 4T1 wild-type 
cell line was used to generate the 4T1shLOX line as previously published’. The 
SW480 +EV, +LOX and +mutLOX cell lines were previously generated and 
authenticated using short tandem repeat analysis”. All cell lines were routinely 
tested for mycoplasma and tested negative for murine pathogens by IMPACT 
testing (IDEXX Laboratories). 

Mass spectrometry acquisition and secretome analysis. After collection, label- 
free and SILAC-labelled CMs were filtered and reduced in volume using 10 kDa 
molecular mass cut-off filters. The remaining protein was dissolved in 6 M urea, 
2 M thiourea and 10 mM HEPES pH 8, after which exact protein amounts were 
determined using a Bradford assay. In SILAC-labelled repeats, the two SILAC 
labels (R10/K8 and RO/KO) were mixed 1:1. In label-free repeats, the samples were 
left unmixed as depicted in Fig. 1c, but equal amounts of starting material were 
used for processing. Proteins were reduced in 1 mM DTT (Sigma) for 45 min at 
room temperature (21°C), alkylated for 45 min using 5.5 mM chloroacetamide 
(Sigma), and digested with 1:50 (enzyme:protein ratio) of mass spectrometry 
(MS)-grade trypsin (Sigma) overnight at 37°C. Peptides were acidified with tri- 
fluoroacetic acid at a final concentration of 2%, and 5 ug of peptides were loaded 
onto a 50 cm C18 reverse-phase analytical column (Thermo EasySpray ES803) 
using an EASY nanoLC 1000. Peptides were eluted over a 4 h gradient ranging 
from 6 to 60% of 80% acetonitrile, 0.1% formic acid, and the Q-Exactive (Thermo 
Fisher Scientific) was run in a DD-MS2 top10 method. Full MS spectra were 
collected at a resolution of 70,000, with an AGC target of 3 X 10° or maximum 
injection time of 20 ms and a scan range of 300-1750 m/z. The MS” spectra were 
obtained at a resolution of 17,500, with an AGC target value of 1 X 10° or 
maximum injection time of 60 ms. Dynamic exclusion was set to 45 s, and ions 
with a charge state <2 or unknown were excluded. MS performance was verified 
for consistency by running complex cell lysate quality control standards, and 
chromatography was monitored to check for reproducibility. Raw data were pro- 
cessed using MaxQuant version 1.5 and Perseus version 1.4. Results were analysed 
using scripts written in-house in Python, and statistically tested for significance 
using the quantile function in the R statistical framework. To ensure high confid- 
ence identifications and quantification, a MaxQuant score of >50 and a minimum 
of two unique peptides per protein seen by tandem MS in all repeats were required. 
Initial analysis was undertaken using a label-free approach (two repeats) for 
global pairwise analyses, and data subsequently validated in a standard- and 
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reverse-label SILAC approach (two repeats). Identified intracellular contaminants 
were removed and secreted proteins retained by using the cellular compartment 
annotations in Ensembl and PantherDB, and Gene Ontology annotation enrich- 
ment for extracellular-associated terms. Raw mass spectrometry data along 
with in-house python scripts are available online at ProteomeXchange Consor- 
tium (http://proteomecentral.proteomexchange.org) with the data set identifier 
PXD000397. 

Microarray data analysis of MDA-MB-231 cells. Previously published micro- 
array data for parental and in vivo selected osteotropic subclones of the human 
MDA-MB-231 breast cancer line were retrieved (GEO accession number 
GSE2603). The subclones have previously been described and shown to be either 
weakly, mildly or strongly osteotropic’®. All data sets were normalized and centred 
to the median of LOX probes. 

In vivo models. Before the start of experiments, mice were randomly allocated 
into cages. Each mouse within the same cage received the same treatment. Cages 
were subsequently randomly allocated for treatment. Sample size was estimated to 
be eight mice per treatment group to ensure >80% power with 95% confidence, 
based on 25% practical difference and 15% coefficient of variation. For tumour- 
bearing studies, 2 10° 4T scr or 4T1shLOX or 4T1Luc cells were injected into 
the mammary fat pad of 8-week-old female BALB/c mice (Taconic). The tumour- 
free model has been previously described’, where 300 pl of tumour-cell CM is 
injected intraperitoneally daily into mice for 3 weeks. Rabbit anti-LOX (aLOX) 
antibody or rabbit IgG control treatments were administered intraperitoneally 
twice a week at 1 mg kg”! 2 days after implantation. The LOX antibody (synthe- 
sized by OpenBiosystems) targets a conserved peptide sequence from the active site 
of human and mouse proteins, blocking function as previously described’’, and has 
been shown not to bind other LOX family members’. For bisphosphonate studies, 
0.6 mg kg ' zoledronic acid was injected intraperitoneally twice a week. Primary 
tumour measurements were performed twice a week using callipers. All experi- 
ments were performed in accordance with UK Home Office regulations following 
UK Coordinating Committee for Cancer Research Guidelines for the Welfare and 
Use of Animals in Cancer Research, or under authorization and guidance from the 
Danish Inspectorate for Animal Experimentation. Cell line sources are stated above 
in ‘In vitro culturing of tumour cells’ and were tested negative for murine pathogens 
by IMPACT testing (IDEXX Laboratories). A second previously published human 
model of colorectal cancer with manipulated LOX expression’® was used to validate 
LOX-dependent findings. SW480 is a non-metastatic colorectal cancer cell line 
with low LOX expression, whose metastatic ability can be increased through 
expression of wild-type LOX (SW480+LOX) but nota catalytically inactive mutant 
(K320A) (SW480+ mutLOX)”. Eight-week-old female nude mice (Charles River) 
were injected daily with 300 ul SW480+EV (empty vector), SW480+LOX (full- 
length active LOX) or SW480+ mutLOX (catalytically inactive mutant (K320A)) 
CM intraperitoneally for 3 weeks. Recombinant LOX (OriGene Technologies) in 
PBS was injected twice a week (25 pg kg’) intraperitoneally into 8-week-old 
female nude (Charles River) or BALB/c mice (Taconic) for 3 weeks. Exclusion 
criteria for data analysis were pre-established such that those mice terminated 
before defined experimental endpoints for ethical and/or licence reasons such as 
undue pain, suffering, distress and/or apparent lasting harm, or unexpected pre- 
mature death, were not used for subsequent analysis. Values of x for all figures are 
displayed in accompanying legends. 

Micro-CT imaging of tibia. Legs were removed from euthanized mice and fixed 
in either 4% paraformaldehyde solution or periodate-lysine-paraformaldehyde 
fixative. Fixed bones were dissected free of tissue and scanned on a micro-CT 
scanner (model 1172 Skyscan) at 50 kV with a 0.5 aluminium filter using a 
detection pixel size of 5 jum. The scanned images were reconstructed using 
Skyscan Recon software and analysed using Skyscan CT analysis software. A 
standard trabecular volume of interest was chosen starting 0.2 mm from the 
growth plate and included all trabeculae in a 1 mm? region of bone. Trabecular 
volume and number were assessed in this region. Total bone volume was also 
determined in a length of the bone from the top of the epiphysis to 3 mm below. 
Osteolytic lesions were measured through 360° of the bone on a three-dimensional 
model in a 3 mm length of cortical bone, starting at the growth plate. Holes smaller 
than 50 jim in diameter were excluded from the analysis as these represent normal 
physiological structures in bone. During analysis, investigators were blinded to 
specific treatment groups. 

In vivo quantification of osteoblast and osteoclast number. Tibiae were fixed in 
4% paraformaldehyde solution, decalcified in 14.3% EDTA for 4 days at 37°C with 
daily changes of EDTA, then embedded in paraffin wax. Sections were cut (at 3 
um) using a Leica Microsystems Microtome and stained with tartrate-resistant 
acid phosphatase (TRAP) as described previously’. The numbers of osteoblasts 
and TRAP-positive osteoclasts were determined on a 3 mm length of endocortical 
surface starting 0.25 mm from the growth plate and viewed on a DMRB micro- 
scope (Leica Microsystems). All histomorphometric parameters were based on the 
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report of the American Society for Bone and Mineral Research histomorphometry 
nomenclature” and were obtained using the OsteoMeasure bone histomorpho- 
metry software (OsteoMetrics). During preparation and analysis of tibiae, inves- 
tigators were blinded to specific treatment groups. 

In vitro osteoclast and osteoblast models. Osteoclasts were generated on dentine 
disks from the CD14* fraction of human peripheral blood as previously 
described”. The CD14* cells were treated with 25 ng ml’ recombinant 
M-CSF (—RANKL), plus either 30 ng ml” ' RANKL (+RANKL) or 150 ng ml! 
recombinant LOX (rLOX) (OriGene Technologies). The LOX antibody was 
added at 4 pg ml. At the end of the culture period the cells were fixed and 
stained for TRAP. The number of TRAP-positive osteoclasts and the amount of 
resorption were determined as previously described*®. For NFATc]1 nuclear local- 
ization, human peripheral blood monocytes were grown in standard osteoclastic 
conditions and the ability of LOX to induce nuclear localization of the transcrip- 
tion factor NFATcl was measured at day 14 (mature, functional osteoclasts). 
For the role of LOX upon NFATc1 nuclear localization, cultures were treated 
for 24 h with rLOX; rLOX in the presence of the LOX antibody at 4 ug ml™ lor 
LOX antibody alone at 4 pg ml '. To determine whether LOX-induced NFATc1 
nuclear localization was mediated by reactive oxygen species, additional cultures 
were treated with 0, 50, 100, 150 and 200 U ml”! catalase with and without rLOX 
(150 ng ml 1), Primary murine calvarial osteoblasts were isolated from neonatal 
BALB/c mice as previously described’’, and seeded into 96-well plates. To deter- 
mine the effect of LOX on the differentiation and function of primary osteoblasts, 
cells were grown to confluence in normal medium (DMEM GlutaMAX with 
sodium pyruvate without phenol red, 100 U ml’ penicillin and 100 pg ml? 
streptomycin, 10% FBS), and then switched to osteogenic medium (DMEM 
GlutaMAX with sodium pyruvate without phenol red (Life Technologies), 100 
U ml’ penicillin and 100 pg ml~' streptomycin, 0.5% FBS and 50 pg ml? 
L-ascorbic acid (Sigma)) and treated with 10 nM dexamethasone (positive con- 
trol), 150 ng ml! rLOX, or rLOX + LOX antibody. Cells were cultured in 
osteogenic medium for 3 weeks with the medium and treatments replaced every 
2-3 days, and 5 mM inorganic phosphate added to all treatments 3 days before the 
end of the culture. Human osteoblast-like cells (SaOS-2), maintained as prev- 
iously described”, were treated with CM from 4T1 cells as previously described’* 
and the effect after 3 days on cell number was measured. SaOS-2 cells were also 
grown in osteogenic medium and the effect of the CM on the differentiation of 
these cells and their ability to mineralize was assessed after 7 days by quantifica- 
tion of alizarin red staining. 

Quantification of mineralization. Cells were rinsed in PBS and fixed in 100% 
ethanol overnight at 4°C. Nodules formed by osteoblasts were stained by alizarin 
red S. Briefly, cells were rinsed twice by PBS and incubated in 40 mM alizarin red S 
(pH 4.2) (Sigma) for 1 h at room temperature. Plates were washed with 95% 
ethanol on the shaker until the solution became clear; 10% cetylpyridinium chlor- 
ide was then added to the wells and incubated at 55°C for 15 min, after which the 
absorbance was read at 550 nm. 

NFATcI staining and quantification. Cultures on coverslips were fixed with 4% 
paraformaldehyde for 15 min. Fixed cultures were rinsed three times between each 
subsequent step with 0.1% Tween in PBS and all incubations were at room tem- 
perature unless otherwise stated. Permeabilization was performed with 0.1% 
Triton X-100 in PBS for 10 min. Blocking was performed with 5% normal goat 
serum in PBS+0.1% Tween for 2 h. Mouse monoclonal antibody to NFATc1 (SC- 
7294, Santa Cruz) and Mouse IgG, control were diluted 1:50 in 5% normal goat 
serum + 0.1% Tween and incubated at +4°C overnight. The secondary incuba- 
tion was with Alexa Fluor 488 goat anti-mouse 1:300 in PBS for 1 h. The final 
incubation with rhodamine phalloidin 1:40 (R415, Invitrogen) and Hoechst 
1:1,000 was for 20 min. Coverslips were mounted using ProLong Gold (Life 
Technologies). Images were captured with a Leica DMI 4000B fluorescence micro- 
scope at X20 objective with 0.70 aperture. NFATc1-positive nuclei were counted 
with ImageJ. Each experiment was conducted with three independent donors. For 
each donor, each treatment group was set up in duplicate. For each duplicate, a 
minimum of 16 fields of view were quantified for nuclear NFATc] signal. 
Bioluminescent intravital imaging of bone colonization. Adult female BALB/c 
mice (8 weeks old) were conditioned as described above. After 3 weeks of con- 
ditioning, mice were anaesthetized and 1 X 10° 4T1Luc cells were injected intra- 
cardially into the left cardiac ventricle. Once a week, mice were injected with 
120 mg kg” luciferin and metastatic dissemination of the cells was monitored 
using IVIS Lumina II (Caliper LifeSciences). Mice were killed by CO2 asphyxiation 


3-5 weeks after tumour cell injection. Metastatic burden was quantified using 
Living Image software (Caliper Life Sciences) by measuring the luminescent signal 
from each leg; regions of interest are shown in Fig. 4. During analysis of IVIS data, 
investigators were blinded to specific treatment groups. 

Immunoblotting. Immunoblotting for lysyl oxidase was as previously 
described*’. Conditioned media and cellular lysates were prepared as previously 
described**. Primary LOX antibody (Open Biosystems) was used at 1:100 and 
B-actin (Abcam) at 1:10,000, with incubation overnight at 4°C. Species-specific 
biotinylated secondary antibodies were used at 1:25,000 and incubated for 1 h at 
room temperature, and visualization performed using ECL Plus (Amersham, GE 
Healthcare). 

Immunohistochemistry for tumour hypoxia. Mice were injected intraperitone- 
ally with pimonidazole (60 mg kg ') 1 h before culling. After excision, 
tumours were fixed in 4% PFA overnight before processing and embedding 
in paraffin according to standard histopathology techniques. Sections (4 |1m) 
were cut and deparaffinized, rehydrated and stained with Hypoxyprobe 
(Hypoxyprobe) overnight after citrate-buffer-mediated antigen retrieval accord- 
ing to the manufacturer’s guidelines. Hypoxyprobe binding was visualized with 
3,3-diaminobenzidine before counterstaining with haematoxylin. Images were 
taken on a NanoZoomer slide scanner (Hamamatsu). 

Explant cultures. The 4T1Luc line was implanted orthotopically as described 
above. Explant cultures of 4T1 tumour-bearing mice were generated at 1, 2, 3, 4 
and 5 weeks after implant in the following ways. From primary tumour, small 5 
mm?’ biopsies were taken and mechanically disaggregated to produce a single cell 
suspension. From lung, the left lobe was removed, washed and mechanically 
disaggregated to produce a single cell suspension. From bone, hindlimbs were 
separated at the joint and all extraneous tissue removed. Tibiae were opened at 
both end and bone marrow as well as tumour cells were flushed by syringe three 
times using PBS. From skin, a small 5 mm” punch of distant skin was mechanically 
disaggregated to produce a single cell suspension. Collected cells were washed and 
plated in standard serum containing media. Forty-eight hours after seeding, media 
were changed to remove non-adhered cells and 500 jg ml ' Zeocin (the selective 
marker for the luciferase cassette) was added for 2 weeks. D-Luciferin salt (Caliper 
Life Sciences) at a final concentration of 3 mg ml * was added just before bio- 
luminescent imaging using the IVIS Lumina II, and quantification of luminescent 
signal used Living Image software (Caliper Life Sciences). 

qRT-PCR. Total RNA was isolated from cells using TRizol (Invitrogen) and 
purified RNA treated with DNase I (New England Biolabs), both according to 
the manufacturer’s instructions. Complementary DNA synthesis were performed 
using an M-MLV Reverse Transcriptase Kit (Invitrogen). (RT-PCR for B-actin 
and firefly luciferase was performed using a LightCycler 480 (Roche). Firefly 
luciferase was amplified using the primers 5'-CTCACTGAGACTACATCAGC-3 
and 5'-TCCAGATCCACAACCTTCGC-3’, and for B-actin 5’-GAGGCCCAGA 
GCAAGAGAGG-3’' and 5'-TACATGGCTGGGGTGTTGAA-3’. 

ELISA. ELISA plates were coated with sera from 4T1scr and 4T1shLOX tumour- 
bearing mice at 4°C overnight. Plates were blocked with 1% BSA at 37°C for 3 h. 
Our anti-LOX antibody was prepared in PBS containing 0.1% BSA, and 100 il was 
added to wells for 2 h at room temperature. Binding of anti-LOX antibody to LOX 
protein was detected using horseradish peroxidase (HRP)-labelled secondary anti- 
bodies (1:10,000 dilution). The CTX-I ELISA (RatLaps) was used for quantitative 
determination of bone-related degradation products from CTX of type I collagen 
in mouse serum released by osteoclasts. All procedures were performed in accord- 
ance with the manufacturer’s guidelines using sera from animals taken at time of 
cull. A sandwich ELISA for detecting human RANKL in the osteoclast medium 
was used according to the manufacturer’s instructions (DuoSet, R&D Systems 
Europe). The sensitivity of the RANKL ELISA was 78.1-5,000 pg ml7'. 
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Extended Data Figure 1 | LOX is hypoxia regulated and strongly associated Q-Exactive (Thermo Fisher Scientific). c, Overlaps between repeats of global 
with osteotropism and metastasis. a, Retrospective analysis of our patient secretome analysis in MDA-MB-231 parent and MDA-MB-231 bone tropic 
cohort including only ER” patients showed that the hypoxic signature is not _ cells grown in normoxic (21% O,) and hypoxic (1% O2) conditions from label- 
significantly associated with liver relapse (P = 0.98), brain relapse (P = 0.17) _ free and SILAC approaches. d, Immunoblotting for LOX in MDA parent 


or lung relapse (P = 0.13). b, log, expression levels under conditions of and 1833 bone tropic subclone under conditions of hypoxia (1% O,) and 
hypoxia (1% O,) and normoxia (21% O,) for secreted proteins from the MDA- _ normoxia (21% O,) confirming expression levels seen in proteomic and 
MB-231 parent and MDA-MB-231 bone tropic (BT) 1833 cell line. Data transcriptomic analyses. Scans of original western blots available as 
representative of four repeats, two label-free repeats, and two SILAC Supplementary Information. 


(standard and reverse-label) repeats. Acquisition performed on the Orbitrap 
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pertains to ER” patients (P = 0.0126) but not ER® patients (P = 0.9537). 

c, Cox-regression using log,(LOX expression data) was used to estimate the 
hazard ratio in two analyses. One analysis used the no-relapse patients and the 
bone relapse patients (data belonging to Fig. 1f), and the second analysis 
included all patients (data belonging to Extended Data Fig. 2a). LOX expression 
is associated with increased hazard ratio, particularly in ER’ patients in both 
analyses. 


Extended Data Figure 2 | Extended patient data analysis. a, Across all breast 
cancer patients, the expression of LOX is associated with metastasis formation 
(P = 0.023) and in particular with ER breast cancer patients (P = 0.0029). 

b, An additional Kruskal-Wallis test between reported bone relapse, relapse 

elsewhere and no relapse patients with an additional contrast test wherein all 
pairwise groups were considered shows that in all patients LOX expression is 
associated with bone relapse compared with no relapse (P = 0.0389). This also 
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Extended Data Figure 3 | Additional patient data analysis in a supporting —_ data on 295 lymph-node-negative patients who did not receive adjuvant 
patient cohort. a, ROC curve analysis shows LOX expression may be indicative _ therapy, with available site of relapse, LOX is significantly higher expressed in 
of metastatic dissemination of ER” breast cancer (area under the curve 0.77, _ bone relapse ER patients, compared with other groups confirming data from 
P< 0.0001) but not ER” patients (area under the curve 0.55, P< 0.1504).b,In the original data set. 

an alternative second patient data set'* (PubMed identifier 12490681) reporting 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a b 
b4 

15 

cs] £ _ 

wn wn s& 

& & 2 

2 
-100kKDa Ss S$ 

4 

[—1 

LOX -70KDa : 
-55kDa 3 5 

i 

FE 


p-actin -| i -40kDa 7 


Control 1 2 3 4 
Weeks post-implant 


e 
8 we ” 
7 
Z6 
§ 
#* = 
gt 
2 3 
8 2 
1 
= 
Control 1 2 3 4 
Weeks post-implant 
h i 
100: 
& 1.25 
2B 80 
6 s 
Bg 60: a 
ge e 0.25 
a2 40 2 0.20 
Be it 
eed v 
8 
2 20 € 0.10 
wv 
OMTIT TE TESTS TOT TS TOT Te a 0.00: 
Weeks postimplant 4 2 3 4 
Primary Lung Bone Distant 
Tumour Skin Weeks Post Implant 
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stimulates osteolytic lesion formation in the absence of tumour cells. 

a, Immunoblotting of 4T1 mammary carcinoma line stably expressing either a 
scrambled (scr) or shLOX vector which leads to a significant decrease in levels 
of detectable LOX. Scans of original western blots available in Supplementary 
Information. b, Micro-CT scanning and reconstruction with structural analysis 
shows decreases in trabecular bone volume (as a percentage of total bone 
volume) (n = 3 mice per group). c, Decreases in trabecular number (per 
millimetre) (n = 3 mice per group) and d, decreases in trabecular thickness in 
tibiae of mice bearing 4T lscr mammary fatpad tumours over time (n = 3 mice 
per group). e, Micro-CT analysis of mouse tibiae shows increases in focal 
osteolytic lesions in 4T1scr tumour-bearing mice develop over time (n = 3 
mice per group). *P < 0.05**, P < 0.01, ***P < 0.001, unpaired parametric 
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one-tailed t-test. f, Representative immunohistochemical staining for 
pimonidazole (Hypoxyprobe) in 4 1m section of 4T1 orthotopic mammary 
carcinoma, 3 weeks after implantation, shows hypoxia (brown staining) as a 
salient feature of tumours. Scale bar, 250 jim. g, Bioluminescent imaging of 
luciferase signal 2 weeks after explant of samples taken from primary tumour, 
lung, bone marrow (tibia) and distant skin samples at 1-5 weeks after primary 
tumour implant. Selection was under 500 jig ml zeocin for the luciferase 
expression cassette (n = 3 mice per time point). h, Quantification of g as a 
percentage of positive luciferase expressing explants from various sites after 
4T1 tumour implant shows tumour cells do not begin to arrive in the bone until 
3 weeks. i, (RT-PCR detection of luciferase expressing 4T1 tumour cells in 
secondary organs confirms explant culture experiments (n = 3 mice per 
time point). 
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Extended Data Figure 5 | Effects of LOX modulation on circulating sera 
levels, primary tumour growth and osteolytic lesion formation. a, ELISA for 
LOX in the sera of 4T1scr and 4T1shLOX tumour-bearing mice (n: ELISA 
signal (arbitrary units) in mouse sera: 3 mice per group) shows decreased levels 
of circulating LOX upon genetic silencing at the primary tumour. *P < 0.05, 
unpaired parametric two-tailed t-test. b, Growth curves as determined by 
calliper measurement for orthotopic 4T1scr and 4T1shLOX mammary 
tumours show no difference between primary tumour growth (n: mice; 3 per 
group). c, Injection of hypoxic CMs from SW480 human colorectal cancer cells 
stably expressing one of; empty vector control (EV), full-length LOX (+LOX), 
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or a catalytically inactive full-length LOX (+mutLOX) (K320A) confirms a 
LOX-dependent mechanism of focal osteolytic lesion generation in a second 
human model of cancer (n: mice; 8 per group). **P < 0.01, ***P < 0.001, 
unpaired parametric two-tailed t-test. d, CTX ELISA (RatLaps) on sera of mice 
injected intraperitoneally twice a week with rLOX for 3 weeks. CTX is a 
telopeptide that can be used as a biomarker in the serum to measure the 

rate of bone turnover (n: nanograms per millilitre circulating CTX-I in mouse 
sera; 5 mice per group). All data are mean + s.e.m. *P < 0.05, **P < 0.01, 
***P < 0.001, unpaired parametric two-tailed t-test. 
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Extended Data Figure 6 | LOX modulates osteoclasts and osteoblast 
behaviour independently of RANKL. a, ELISA for RANKL in the CM of 
osteoclast cultures shows no detectable levels of RANKL in M-CSF alone 
(negative control) and rLOX cultures excluding the likelihood of autocrine 
production by cells in response to rLOX (n: ELISA signal (arbitrary units); data 
are from three independent experimental repeats in all groups). *P < 0.05, 
unpaired parametric two-tailed f-test. b, All proteins detected by mass 
spectrometry analysis in the rLOX preparations (based on MaxQuant 1.5 
peptide identity score of 50 and a minimum of two unique MS peptide 
observations). c, Examples of nuclear localization of NFATc]1 after addition of 
rLOX in the presence and absence of the LOX antibody (green, NFATc]; red, 
phalloidin; blue, DAPI). d, Representative alizarin red S plate showing 
mineralization ability (calcium deposits as detected by alizarin red S staining) of 
primary calvarial mouse osteoblasts after treatment with dexamethasone 


Control 


4Tiscr 4Tiscr 
cM cM 
+LOX Ab 


Control 


4Tiscr 4T1scr 
cM cM 


+LOX Ab 
(positive control) or rLOX + LOX ab; quantification shown in Fig. 3g. e, High- 
LOX-containing hypoxic 4T1scr CM significantly reduces cell proliferation of 
the human osteoblast-like SaOS-2 cell line, which can be partly blocked by 
treatment with anti-LOX antibody (n: normalized cell number per well; control 
24 wells; 4T1scr CM 49 wells; 4T1scr CM + LOX Ab 51 wells). Data collected 
over three independent experimental repeats. *P < 0.05, **P < 0.01, 

***P < 0.001, unpaired parametric two-tailed t-test. f, Mineralization ability 
(calcium deposits as detected by alizarin red S staining) is increased in the 
human osteoblast-like SaOS-2 cell line in response to high-LOX-expressing 
hypoxic 4T1scr CM, the effects of which can be attenuated using the anti-LOX 
antibody (n: alizarin red S staining per well, data taken from three independent 
repeats; control 18 wells; 4T lscr CM 9 wells; 4T 1lscr CM + LOX Ab 9 wells). All 
data are mean + s.e.m. *P < 0.05, **P < 0.01, ***P < 0.001, unpaired 
parametric two-tailed t-test. 
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and primary tumour growth. a, 4T scr tumour-bearing mice treated with our _ show that, when administered alone, zoledronic acid does not affect primary 
LOX antibody show a decrease in osteoclast perimeter in tibial bones in 4T scr primary tumour growth in vivo (n: mice; 4 in all groups). c, Pearson 
support of LOX as a modulator of osteoclastogenesis shown in Fig. 3 correlation shows a positive correlation between lesion number as determined 
(n: mice; 4T1scr Tumour + IgG 5, 4T1scr Tumour + LOX Ab 7). Data are by micro-CT analysis and luciferase signal (radiance (photons per second 
mean + s.e.m. *P < 0.05, unpaired parametric two-tailed t-test. b, Weekly per square centimetre per steradian)) from 4T1Luc tumour cells within the 
tumour volumetric measurements for 4T1scr tumour-bearing mice treated bone (r = 0.58, 95% CI 0.2778-0.7834, P = 0.0009 (two-tailed)). 
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Figure 1 | Pre-metastatic niche formation in bone. a, Cox et al. find that breast tumour cells that are 
exposed to hypoxic conditions secrete the enzyme lysyl oxidase (LOX) into the bloodstream. b, In bone, 
LOX activates cells called osteoclasts to enhance bone breakdown, resulting in the formation of bone 
lesions. c, These lesions create a pre-metastatic niche: breast cancer cells from the original tumour that are 
disseminated by the circulation are able to occupy this niche and form a metastatic tumour. 


the authors demonstrate that such bone lesions 
can be created even in a tumour-free system: 
when they injected mice with factors secreted 
by hypoxic breast tumour cells, these soluble 
factors induced bone lesions that enhanced 
the formation of bone metastases by cancer 
cells circulating in the bloodstream. Thus, their 
study shows that systemic LOX, secreted by ER" 
breast tumours, drives the formation of a pre- 
metastatic niche in bones, which precedes and 
facilitates the formation of metastases (Fig. 1). 

The pre-metastatic niche concept suggests 
that a hospitable microenvironment is formed 
in target organs before the arrival of metastatic 
tumour cells and enables their invasion, sur- 
vival and proliferation®. Although the notion of 
tumour cells as ‘seeds’ that require a fertile ‘soil’ 
for their growth was suggested more than a 
century ago”””, the mechanisms that enable this 
soil to be prepared have only emerged gradually 
over recent years. It was not clear whether the 
earliest changes in incipient metastatic niches 
are accomplished systemically, by soluble fac- 
tors secreted from the primary tumour", or by 
the presence of a small number of disseminated 
tumour cells, or through both processes. Cox 
and colleagues’ exciting discovery provides 
evidence supporting the systemic nature of pre- 
metastatic niche formation and contributes to 
our understanding of systemic regulation of 
cancer progression and metastasis. 

The study is limited by its use of only one 
model of transplantable mammary tumour, 
rather than a genetically engineered model in 
which the breast tumour arises in the mouse. 
However, there is a lack of models in immune- 
competent mice in which such tumours spon- 
taneously metastasize to bones. Another 
limitation is an interesting issue that remains 
unresolved: why is the secretion of LOX by 
hypoxic breast cancer cells predominantly 
linked with bone relapse in patients with ER 
breast cancer? Although hypoxia-related sig- 
nalling was previously shown to drive breast 
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cancer metastasis’, a detailed dissection of the 
link between breast cancer subtype, hypoxia 
and tumour-cell attraction to bone is yet to be 
performed. 

Elucidating the early interactions between 
disseminated tumour cells or their soluble 
products and their new microenvironment 
is an essential prerequisite for the develop- 
ment of effective targeted therapies. Target 
molecules are likely to be organ-specific, 
because the complex components and inter- 
actions of tissues vastly differ in different 
organs (such as bone versus brain). Adding 
to this complexity, this new study suggests 
that biomarkers that predict potential risk 
for organ-specific metastases are also specific 
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for different tumour subtypes. 

Interestingly, several studies have indicated 
that drugs that prevent bone destruction (such 
as bisphosphonates and the monoclonal anti- 
body denosumab) are efficient co-therapies for 
preventing bone metastasis’. Therefore, the 
knowledge gained from Cox and colleagues’ 
findings may open new horizons in the treat- 
ment of patients with breast cancer after 
removal of the primary tumour. Analysis of the 
expression of LOX may provide both a molec- 
ular tool to stratify patients by their propensity 
for bone metastasis and a target for preventive 
treatment for patients at a higher risk of bone 
metastasis. 
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Proton smasher spots 
rare particle decays 


The extremely rare decays of particles known as neutral B mesons have been 
observed at CERN’s Large Hadron Collider. The result may be a glimpse of 
physics beyond that of the standard model of particle physics. SEE LETTER P.68 
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have been looking for the decay of the 

‘strange B meson particle into a pair of 
muons, the heavy cousins of electrons. The 
process is incredibly rare, and harder to find 
than the famous Higgs particle, the discov- 
ery of which at the Large Hadron Collider 
at CERN, near Geneva, Switzerland, was 
celebrated worldwide in 2012. The standard 
model of elementary particle physics' makes 


= more than three decades, physicists 
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an exact prediction of the number of particle- 
decay events researchers should observe in an 
experiment. Anything more than the predicted 
value means potential trouble for the standard 
model. On page 68 of this issue, researchers 
working on the CMS and LHCb collabora- 
tions’ at the Large Hadron Collider describe 
a joint analysis of data from proton collisions 
that set the decay rate of the strange B meson 
at about three in one billion — in agreement 
with the standard-model prediction. However, 
they find that the decay rate of another type of 


neutral B meson, the ‘non-strange’ 
B meson, is at odds with the expec- 
tation from the standard model. 

The standard model is at a cross- 
roads. It has been very successful 
in describing elementary particles 
and their interactions, but such 
particles comprise only 4% of 
the known Universe. The theory 
does not provide a candidate for 
the dark matter that binds galax- 
ies together and makes up one- 
quarter of the cosmos. Nor does 
it accommodate dark energy, the 
remaining, unknown component 
of the Universe that is causing it to 
expand at an accelerated rate. It 
also does not explain the prepon- 
derance of matter over antimatter. 
Lastly, it makes a worrisome warn- 
ing that the Universe is probably 
unstable, ready to collapse in a 
‘big crunch. 

Many models have been proposed to solve 
some of these problems. One of the most com- 
pelling ideas for unknown physics beyond that 
of the standard model is supersymmetry’, 
affectionately called SUSY. Supersymmetry 
states that, for every known particle, there is 
a twin ‘superparticle of much higher mass. 
These superparticles could in principle be 
produced in colliders. They should quickly 
decay to lighter superparticles and ordinary 
particles, except for the lightest superparticle, 
which should be stable — and that is SUSY’s 
candidate particle for dark matter. 

Physicists have been searching for SUSY 
superparticles for years, so far with no success. 
In the absence of direct observations, they 
watch for discrepancies of measurements 
of particle properties from standard-model 
predictions. The decay of neutral B mesons to 
muons (Fig. 1) is a sensitive test of the standard 
model because the model predicts the decay 
rate with good precision. B mesons are made 
up of one quark and a ‘botton” antiquark, the 
antimatter partner to the quark; quarks are 
the elementary building blocks of protons and 
neutrons, and come in six flavours (up, down, 
strange, charm, top and bottom). There are 
two kinds of neutral B meson, which have 
no charge. One type, the strange B meson 
(B.), contains a bottom antiquark and a 
strange quark. The other, the non-strange 
B meson (B°), has a bottom antiquark paired 
with a down quark. The decay of the neutral 
B mesons to a pair of muons would mean that 
the bottom antiquark and its quark partner 
annihilate, and that the energy released in the 
process is given to the muons. 

But the standard model forbids the annihila- 
tion of quarks of different flavours, so it pre- 
dicts the decay of the neutral B mesons into 
muons through an intermediate process that 
involves the exchange of a top quark between 
the quarks and the emission of two W bosons 


- B° meson 
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Figure 1 | The rare decay of neutral B mesons to muons. The CMS and 
LHCb collaborations” have accelerated and smashed together beams of 
protons travelling in opposite directions in the Large Hadron Collider at 
CERN, near Geneva, Switzerland, producing neutral B mesons, among many 
other particles. The authors observed the extremely rare decay of the strange 
neutral B meson (B,’) to two oppositely charged muons (u/* and 1) with high 
statistical significance. 


(elementary particles that mediate the weak 
nuclear force). The decay of the strange 
B meson is expected to occur by this process 
in about four parts in one billion, and that of 
the non-strange B meson in about one part in 
ten billion. However, if yet-unknown SUSY 
superparticles are exchanged between the 
quarks in addition to the top-quark exchange, 
these decay rates will be greatly enhanced rela- 
tive to the standard-model rate. 

The decay rate of the strange B meson 
observed by the CMS and LHCb collaborations 
confirms the standard-model prediction. That 
is good news for the standard model, but not 
such good news for physics beyond it. However, 
the decay of the non-strange B meson, which 
the authors also observed, albeit with a lower 
statistical significance than obtained for the 
strange B meson, exceeded the standard-model 
expectation by almost fourfold — something to 
watch in the years to come. 

CMS and LHC) are two of seven particle 
detectors at the Large Hadron Collider. Their 
designs follow different concepts. CMS 
is a large cylinder (21.6 metres long and 
14.6 metres in diameter) in which two coun- 
ter-propagating beams of protons collide and 
give rise to neutral B mesons, among many 
other particles. LHCb is specifically designed 
to study B mesons, which tend to stay close 
to the line of the beam pipe. Unlike the CMS 
detector, which surrounds the proton collision 
point, the LHCb detector is a stack of instru- 
ments stretching for 20 metres along the beam 
pipe on one side of the collision point. But the 
two teams adopted a similar strategy to analyse 
their data. Both groups selected particle events 
that involved two oppositely charged muons 
travelling from a common point, which is dis- 
placed by a few hundred micrometres from the 
point at which the protons collide. The events 
associated with the decay ofa neutral B meson 
are a small fraction of initial candidates. The 
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rest are random pairs of muons 
originating from other, more com- 
mon processes. 

To separate the signal of the 
neutral B meson from background 
events, the teams each built a ‘deci- 
sion tree’ — a sequence of binary 
splits of data into signal-like and 
background-like parts. The system 
‘learns’ to distinguish between sig- 
nal and background by ‘training’ 
ona simulated sample of the sig- 
nal and on a sample of real data 
representing background events. 
For the selected signal-like events, 
the researchers deduced the mass 
of the parent particles using the 
momenta and directions of travel 
of the two muons. They then com- 
pared the spectrum of the deduced 
masses with that predicted for a 
sum of two bell-shaped curves 
corresponding to the two kinds of 
neutral B meson, strange and non-strange, and 
a smooth background. 

The two collaborations had previously 
performed this type of analysis, and each 
reported their results in separate publica- 
tions*”. But it was only the combination of 
data from the two experiments that allowed 
the researchers to observe with high statistical 
significance the decay of the strange B meson. 
In the process, the researchers identified, and 
corrected, issues with the previous analyses. In 
particular, they isolated and subtracted a back- 
ground from the decay of a particle called a 
bottom Lambda baryon that mimics the signal 
ofa neutral B meson. 

Studies of B-meson decays will continue 
in the coming years. The Large Hadron Col- 
lider has just restarted after a two-year break 
for upgrades, and will soon accelerate proton 
beams to an energy of 13 teraelectronvolts 
(TeV), increased from the 8-TeV level reached 
before the upgrades. The proton beams will 
also be more tightly focused and will collide 
at a higher rate than that achieved so far. Both 
experiments will collect a large number of 
rare events, and should eventually find which 
path away from the standard model nature has 
chosen. = 
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in the presence of SA11, P multiseries increases 
expression of genes associated with photo- 
synthesis and carbon fixation, presumably to 
support the higher growth rates observed, as 
well as to provide organic carbon exudates for 
the bacterial partner. 

But the emerging picture of a diatom- 
bacterium partnership is more complex than 
this simple resource swap. Amin et al. postulate 
that the exchanges between these free-living 
microbes are coordinated through cycling of 
the hormone indole-3-acetic acid (IAA) and 
the amino acid tryptophan. Best known for its 
use by terrestrial plants to direct developmen- 
tal processes such as the growth of new shoots, 
IAA also has a role in signalling between soil 
bacteria and plants’. The researchers dem- 
onstrate that P. multiseries and Sulfitobacter 
SA11 secrete tryptophan and IAA, respectively. 
Moreover, they show that addition of synthetic 
IAA to cultures of P multiseries stimulates the 
diatom’s growth, but that the effect is signifi- 
cantly greater when the IAA-producing bac- 
terium itself is present. This indicates that, 
although IAA promotes diatom cell division, 
additional unidentified factors are involved in 
the positive feedback loop that results in major 
diatom growth enhancement. The authors 
also detected IAA in water samples from five 
North Pacific sites and present transcriptomic 
evidence from field samples for multiple IAA 
biosynthesis pathways, each incorporating dif- 
ferent precursor molecules. Thus, it seems that 
IAA signalling occurs across domains of life 
in both the terrestrial and marine biospheres 
and is probably an ancient mechanism of 
organismal communication. 

Amin and colleagues’ study represents a 
substantial step forward for understanding 
the complex network of interactions between 
phytoplankton and bacteria and provides a 
springboard for development of hypotheses 
on cross-talk between marine microbes. For 
example, the extreme interaction specific- 
ity observed suggests that the consortium of 
bacteria residing in a particular habitat may 
bea major force in structuring the local phyto- 
plankton community, or vice versa. Moreover, 
it seems reasonable to speculate that, in addi- 
tion to IAA and tryptophan, other signalling 
molecules participate in inter- or intradomain 
communication among marine microbes. 

Perhaps the most pressing question we 
are left with is when and where such interac- 
tions occur. Symbioses between diatoms and 
nitrogen-gas-fixing cyanobacteria are 
known", and bacterial attachment to dia- 
toms is often reported during algal blooms 
and in the bloom senescence phase’*”’. For 
decades, scientists have also speculated on the 
potential influence of microscale variability 
of resources in marine environments“ and on 
the significance of the phycosphere’’ — a zone 
around algal cells considered analogous to the 
root zone of plants. Concentrations of algal 
secreted compounds are much higher in the 


phycosphere than in nearby sea water, owing 
to the basic physics of the diffusive boundary 
layer surrounding the cell, thus promoting 
growth of bacteria in this layer’*”’. 

Amin and colleagues’ results suggest that 
the specific growth-enhancing interactions 
observed occur in the phycosphere, but the 
sampling methods and quantitation tech- 
niques needed to directly assess the physical 
nature of these associations are still lacking. 
Exciting times are ahead as scientists develop 
techniques to examine the physical intricacies 
of these mutualistic relationships, their preva- 
lence and structuring roles in marine micro- 
bial communities, and how they might shift 
under environmental change. m 
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Diversity in the 
lymphatic vasculature 


Two studies of the cells that give rise to lymphatic vessels reveal that precursors 
arise from unexpected sources, demonstrating that the origins of this vasculature 
are more diverse than anticipated. SEE ARTICLES P.56 & P.62 


BENJAMIN M. HOGAN & BRIAN L. BLACK 


he lymphatic vasculature is a specialized 

network that drains fluid from tissues 

and enables immune-cell trafficking 
and surveillance throughout the body. This 
major constituent of the circulatory system 
was long considered to be ancillary to the 
blood vasculature, and as such has received 
less attention than its counterpart. Asa result, 
although the roles of lymphatic vessels in tissue 
maintenance and disease are now well appreci- 
ated!”, their origins have remained contentious 
for more than acentury”*. Two studies”® in this 
issue provide insights into the origins of the 
lymphatic vasculature. 

In vertebrate embryos, lymphatic vessels 
arise from two pre-existing veins’. Vascular 
endothelial cells that line the walls of these 
cardinal veins turn on genes that direct venous 
cells to become lymphatic endothelial cells 
(LECs), which leave the veins and form the 
lymphatic vessels. In this way, one vascular 
network gives rise to another. But precisely 
how and when a pool of LEC precursors is 
established in the cardinal veins, and whether 
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all cells in the early veins are identical, has 
remained unclear. 

Zebrafish embryos are transparent, which 
makes them an ideal organism for visualizing 
cell movements. Nicenboim et al.° (page 56) 
traced early LEC development in zebrafish 
and unexpectedly discovered that the cells on 
the dorsal (upper) and ventral (lower) walls of 
the cardinal veins are different. The research- 
ers report that lymphatic precursors sprout 
from the dorsal wall, but that these cells actu- 
ally originate in the ventral wall at an earlier 
developmental stage (Fig. 1). 

By using fluorescence to track single cells, 
the authors found that precursor cells in the 
ventral wall divide, after which one of the 
two daughter cells migrates to the dorsal 
wall. These dorsal daughter cells can then 
give rise to LECs. Interestingly, the cells in 
the ventral wall also contribute to the intes- 
tinal blood vasculature, indicating that they 
can give rise to multiple types of endothe- 
lium. Nicenboim and colleagues showed 
that the fate of these nascent precursor cells 
is controlled by a signalling molecule, Wnt5b, 
which acts from tissues adjacent to the ventral 
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Figure 1 | Contributions to lymphatic-vessel development. Lymphatic endothelial cells (LECs), which 
derive primarily from veins in the early embryo, line lymphatic vessels. Nicenboim et al.* report that LEC 
precursors from the ventral side of embryonic veins divide, before one of the two daughter cells migrates 
to the dorsal side of the vein. From there, these cells can differentiate into LECs and migrate to the site 

of forming lymphatic vessels. Klotz et al.° found that the LECs that form lymphatic vessels in the heart 
originate both from embryonic veins and from other non-venous sources, including the yolk sac. 


wall of the vein. This observation points to 
an unexpected instructive role for ventral 
tissues and Wnt signalling in driving cells to 
become LECs. 

It is difficult to pinpoint the stage at which a 
cell becomes committed to its future identity, 
but Nicenboim and colleagues’ observations 
should prompt us to revise our thinking about 
how cellular identity is acquired in vascular 
lineages. Their work suggests that changes 
in cellular identity are probably coupled to 
dynamic cell movements, rather than to step- 
wise changes in gene expression, as current 
dogma suggests. One attractive idea is that the 
range of cell types that undifferentiated precur- 
sors in the vasculature can become is restricted 
progressively at sequential locations during 
vessel development. Future analyses should 
test this and other models. 

Klotz et al.° (page 62) examined a later 
stage of development, when lymphatic vessels 
form in the heart, which, like all organs, uses 
the lymphatic system to maintain normal 
fluid levels and for immune-cell trafficking®. 
An understanding of how lymphatic vessels 
develop in individual organs, however, has 
been even more enigmatic than a definition 
of the origins of the system as a whole. Indeed, 
although the cardiac lymphatic system has 
previously been described’, its development, 
origins and functions have remained unclear. 

These authors found that most of the 
cardiac lymphatic system in mice arises from 
embryonic veins located outside the heart. But 
unexpectedly, they show that many coronary 
LECs do not originate from veins at all (Fig. 1). 
By using various genetic ‘fate-mapping’ tech- 
niques to indelibly label cells, allowing cell 
movements and descendants to be traced, the 
authors observed that roughly 20% of LECs 
in the heart’s lymphatic vasculature originate 
in the yolk sac that surrounds the developing 
embryo. The researchers suggest that these 
yolk-sac cells have a previously unappreciated 
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potential to give rise to LECs directly, in 
addition to their known roles in forming blood 
cells and vascular endothelial cells. 

All fate-mapping experiments in mice must 
be interpreted cautiously, because the tech- 
niques used to label cells genetically can also 
label other cell types. Nonetheless, Klotz and 
colleagues’ work indicates that cardiac LECs 
originate from different places: embryonic 
veins and at least one other source. Future 
studies that do not use genetic labelling should 
further clarify the origins of cardiac and other 
LECs in developing organs. 

Klotz et al. also investigated the role of 
cardiac lymphatic vessels in repairing the 
heart after a heart attack. They found that 
the injuries sustained from a heart attack in 
mice stimulate the growth of new lymphatic 
vessels, which in turn promote cardiac repair. 
These are previously unknown functions for 
coronary lymphatic vessels, and the authors’ 


findings have implications for the treatment 
of cardiac conditions such as heart attack and 
atherosclerosis. They raise the possibility that 
the activation of LECs in the heart may be a 
way to promote cardiac repair. 

These two studies provide fresh insight into 
our vascular drainage network. Taken together, 
they describe an early source of LEC precur- 
sors in embryonic cardinal veins, reveal an 
unexpected level of diversity in the signals 
that control early LEC development, and 
demonstrate that more cell types than previ- 
ously appreciated contribute to the formation 
of cardiac lymphatic vessels. In the light of 
these observations, the molecular and cellular 
processes that control lymphatic-vessel forma- 
tion seem to be much more diverse than was 
previously thought. The studies promise to 
invigorate future research into this enigmatic 
but vital vascular network. = 
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Precise control of 
localized signals 


The tumour-suppressor protein PTEN is mostly found in the cell cytoplasm, 
tethered to endosome vesicles. This localization regulates the enzyme’s activity 
towards specific lipids and influences its control of cell growth. 


VUK STAMBOLIC 


suppressor protein is a common feature 
of many types of human cancer, includ- 
ing glioblastoma and prostate, kidney, thyroid 
and breast tumours’. In addition to genetic 


| oss of function of the PTEN tumour- 
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alterations occurring in sporadic tumours, 
germline mutations in the gene that encodes 
PTEN cause the group of disorders known 
as PTEN hamartoma tumour syndromes, 
which are characterized by benign tumours 
throughout the body and increased incidence 
of breast, thyroid and brain cancers’. Since the 
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Figure 2 | Orbital architecture. The satellite system of Pluto-Charon resembles some of the exoplanet 
systems discovered by the Kepler space observatory. Pluto’s small moons orbit the system’s centre of mass 
clockwise; the exoplanets orbit their respective stars (Kepler 730 and Kepler 2169). For each system, the 
scale is set relative to the orbit of the innermost moon or planet (the relative scales vary across systems; 
the gap between Pluto and Charon is not on the same scale as the orbits of the moons). The dots indicate 
the relative positions of the moons or planets; the circles show their respective gravitational spheres of 
influence. Similarly to the exoplanets, the spheres of influence of Pluto's moons leave little space for other 
potential (as yet undiscovered) objects in intermediate orbits. 


satellite formation®’. Large fragments that 
survived the giant impact, thought to have 
led to the creation of the system, might have 
irregular shapes; satellites grown from much 
smaller particles might be more rounded. The 
authors find that the ellipsoidal shapes of the 
two larger moons, Hydra and Nix, seem more 
consistent with grown satellites than with 
impact fragments. Their optical reflectivity, at 
40%, is similar to Charon’s (36-39%), but lower 


than Pluto's (50-65%, which is comparable to 
the reflectivity of sea ice). With a reflectivity 
of only 4-6%, Kerberos is as dark as coal and 
seems out of place with such bright compan- 
ions. Perhaps it is a dark fragment that was 
ejected during the giant impact. 

It is hoped that NASA’s New Horizons® 
spacecraft, due to fly by Pluto in July, will throw 
yet more light on these questions. Close-up 
images taken by the spacecraft will further 


Opening LOX 
to metastasis 


New findings implicate the enzyme lysyl oxidase (LOX), secreted by oxygen- 
deprived breast cancer cells, in inducing bone lesions that precede and facilitate 
the spread of the cancer cells to the bone. SEE LETTER P.106 


NETA EREZ 


espite extensive research, breast can- 
D cer remains one of the leading causes 

of cancer-related deaths in women, 
and mortality from breast cancer is almost 
exclusively a result of the tumour spreading to 
distant organs. Bones are the most common 
site of metastasis associated with breast cancer, 
affecting up to 80% of women with metastatic 
disease. Bone metastases are typically incur- 
able and encompass severe disease features, 
including pain, bone destruction, hypercalcae- 
mia and debilitating skeletal-related events’. 
In this issue, Cox et al.” (page 106) establish a 
mechanistic link between bone metastasis of 
breast tumours and expression of the enzyme 


lysyl oxidase (LOX) by breast cancer cells. 
Metastases in bones and other organs are 
typically diagnosed months or years after the 
initial diagnosis and removal of the primary 
tumour. This temporal lag is, at least in part, due 
to the fact that although disseminated tumour 
cells have cell-intrinsic survival and prolifera- 
tive programs, they must be able to manipulate 
tissue cells in the new and hostile microenviron- 
ment of the metastatic organ to support their 
growth**. The early molecular changes at the 
metastatic niche are the rate-limiting step of 
metastasis, and understanding the mechanisms 
that facilitate the formation ofa hospitable niche 
is a central challenge in cancer research. 
Hypoxia (lack of an adequate oxygen supply) 
in the primary tumour is generally associated 
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constrain the sizes, shapes and reflectivities 
of Nix, Kerberos and Hydra, but not of Styx 
— it is too small to be resolved in the images. 
The mission’s spectroscopic measurements 
of the relative abundances of various ices will 
probably yield a reflectivity for Styx, and allow 
comparison of the compositions of the satel- 
lites. If new satellites or rings of small particles 
are found, and their bulk properties estab- 
lished, this will provide additional information 
on the extent of the system. These much- 
anticipated observations will lead to improved 
theories of the formation and evolution of 
planets and their satellites. Linking all these 
results to ongoing observations of the growing 
population of known exoplanets will extend 
tiny Pluto’s reach far beyond the Solar System. m 
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with increased metastases’. However, when 
Cox and colleagues performed retrospec- 
tive analyses of hypoxic breast tumours from 
humans, they found that hypoxia was corre- 
lated with increased bone metastases only ina 
subtype of breast tumour that does not express 
the receptor for oestrogen (ER tumours). In 
an attempt to identify the factors underlying 
this specificity, Cox et al. analysed the proteins 
secreted by those breast cancer cells that were 
attracted to the bone and found that high levels 
of LOX were associated with bone metastases 
in ER breast tumours. LOX belongs to a fam- 
ily of secreted proteins that crosslink colla- 
gen fibres in the extracellular matrix (ECM), 
which determines the strength and structural 
integrity of tissues®. LOX has been shown to 
contribute to metastasis of breast cancer to 
lungs by modifying the ECM at the metastatic 
niche®”, but it had not previously been impli- 
cated in regulating bone homeostasis. 

Using a transplantable mouse model of breast 
cancer that spontaneously metastasizes to bone, 
the authors demonstrate that LOX is secreted 
by hypoxic breast cancer cells and that it dis- 
rupts the balance between bone formation and 
destruction such that there is greater overall 
bone loss (resorption). These sites of damaged 
bone provide a favourable environment for dis- 
seminated breast cancer cells, thereby facilitat- 
ing the formation of bone metastases. Moreover, 
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Figure 1 | Pre-metastatic niche formation in bone. a, Cox et al. find that breast tumour cells that are 
exposed to hypoxic conditions secrete the enzyme lysyl oxidase (LOX) into the bloodstream. b, In bone, 
LOX activates cells called osteoclasts to enhance bone breakdown, resulting in the formation of bone 
lesions. c, These lesions create a pre-metastatic niche: breast cancer cells from the original tumour that are 
disseminated by the circulation are able to occupy this niche and form a metastatic tumour. 


the authors demonstrate that such bone lesions 
can be created even in a tumour-free system: 
when they injected mice with factors secreted 
by hypoxic breast tumour cells, these soluble 
factors induced bone lesions that enhanced 
the formation of bone metastases by cancer 
cells circulating in the bloodstream. Thus, their 
study shows that systemic LOX, secreted by ER" 
breast tumours, drives the formation of a pre- 
metastatic niche in bones, which precedes and 
facilitates the formation of metastases (Fig. 1). 

The pre-metastatic niche concept suggests 
that a hospitable microenvironment is formed 
in target organs before the arrival of metastatic 
tumour cells and enables their invasion, sur- 
vival and proliferation®. Although the notion of 
tumour cells as ‘seeds’ that require a fertile ‘soil’ 
for their growth was suggested more than a 
century ago”””, the mechanisms that enable this 
soil to be prepared have only emerged gradually 
over recent years. It was not clear whether the 
earliest changes in incipient metastatic niches 
are accomplished systemically, by soluble fac- 
tors secreted from the primary tumour", or by 
the presence of a small number of disseminated 
tumour cells, or through both processes. Cox 
and colleagues’ exciting discovery provides 
evidence supporting the systemic nature of pre- 
metastatic niche formation and contributes to 
our understanding of systemic regulation of 
cancer progression and metastasis. 

The study is limited by its use of only one 
model of transplantable mammary tumour, 
rather than a genetically engineered model in 
which the breast tumour arises in the mouse. 
However, there is a lack of models in immune- 
competent mice in which such tumours spon- 
taneously metastasize to bones. Another 
limitation is an interesting issue that remains 
unresolved: why is the secretion of LOX by 
hypoxic breast cancer cells predominantly 
linked with bone relapse in patients with ER 
breast cancer? Although hypoxia-related sig- 
nalling was previously shown to drive breast 
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cancer metastasis’, a detailed dissection of the 
link between breast cancer subtype, hypoxia 
and tumour-cell attraction to bone is yet to be 
performed. 

Elucidating the early interactions between 
disseminated tumour cells or their soluble 
products and their new microenvironment 
is an essential prerequisite for the develop- 
ment of effective targeted therapies. Target 
molecules are likely to be organ-specific, 
because the complex components and inter- 
actions of tissues vastly differ in different 
organs (such as bone versus brain). Adding 
to this complexity, this new study suggests 
that biomarkers that predict potential risk 
for organ-specific metastases are also specific 
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for different tumour subtypes. 

Interestingly, several studies have indicated 
that drugs that prevent bone destruction (such 
as bisphosphonates and the monoclonal anti- 
body denosumab) are efficient co-therapies for 
preventing bone metastasis’. Therefore, the 
knowledge gained from Cox and colleagues’ 
findings may open new horizons in the treat- 
ment of patients with breast cancer after 
removal of the primary tumour. Analysis of the 
expression of LOX may provide both a molec- 
ular tool to stratify patients by their propensity 
for bone metastasis and a target for preventive 
treatment for patients at a higher risk of bone 
metastasis. 
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Proton smasher spots 
rare particle decays 


The extremely rare decays of particles known as neutral B mesons have been 
observed at CERN’s Large Hadron Collider. The result may be a glimpse of 
physics beyond that of the standard model of particle physics. SEE LETTER P.68 


DARIA ZIEMINSKA 


have been looking for the decay of the 

‘strange B meson particle into a pair of 
muons, the heavy cousins of electrons. The 
process is incredibly rare, and harder to find 
than the famous Higgs particle, the discov- 
ery of which at the Large Hadron Collider 
at CERN, near Geneva, Switzerland, was 
celebrated worldwide in 2012. The standard 
model of elementary particle physics' makes 


= more than three decades, physicists 
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an exact prediction of the number of particle- 
decay events researchers should observe in an 
experiment. Anything more than the predicted 
value means potential trouble for the standard 
model. On page 68 of this issue, researchers 
working on the CMS and LHCb collabora- 
tions’ at the Large Hadron Collider describe 
a joint analysis of data from proton collisions 
that set the decay rate of the strange B meson 
at about three in one billion — in agreement 
with the standard-model prediction. However, 
they find that the decay rate of another type of 
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Exclusive networks in the sea 


The identification of an exchange of nutrients and signalling molecules between a planktonic alga and a bacterium 
demonstrates that targeted mutualistic interactions occur across domains of life in the oceans. SEE LETTER P.98 


ALEXANDER J. LIMARDO & 
ALEXANDRA Z. WORDEN 


he sunlit surface waters of the ocean are 

inhabited by unicellular algae that per- 

form approximately half of our planet’s 
photosynthesis’ and are crucial for sustaining 
Earth's atmosphere. These phytoplankton live 
in a complex milieu with other microorgan- 
isms, many of which rely on the products of 
algal photosynthesis for growth’. Although 
chemical cross-talk between the micro- 
organisms that inhabit the human body or the 
root zone of plants is well established*”, it is 
difficult to imagine similarly intimate inter- 
actions in dilute ocean environments. Indeed, 
an enduring mystery of marine ecology and 
carbon-cycle science is whether there are spe- 
cific mutualistic relationships between ocean 
microbes or whether exchanges are largely 
the result of random encounters between 
released compounds and free-living cells. On 
page 98 of this issue, Amin et al.° describe how 
a widespread free-living alga and a bacterium 
engage in a targeted exchange of nutrients and 
metabolites. This includes transfer of a hor- 
mone found in land plants, although neither 
organism is evolutionarily related to plants. 

The diatoms are a group of eukaryotic algae 
that have important roles in primary produc- 
tion (the generation of organic carbon from 
carbon dioxide) and in marine food chains’. 
Amin et al. studied the diatom Pseudo- 
nitzschia multiseries, which has a complicated 
ecological role because, as well as being a pri- 
mary producer, it can produce the neurotoxin 
domoic acid, which causes amnesic shellfish 
poisoning in humans and other consumers 
as its concentration is increased up the food 
chain. In the current report, the authors focus 
on how P. multiseries growth is affected by the 
activities of bacteria, the identities of bacteria 
that augment its growth, and how the alga or 
bacterium may manipulate the other to its own 
benefit. 

To illuminate these interactions, Amin 
et al. used an impressive array of co-culturing 
experiments, genome sequencing, RNA- 
transcript analyses and metabolite profiling 
in studies extending from the laboratory into 
the wild. In characterizing algae—bacteria rela- 
tionships, the researchers found that, among 
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Figure 1 | Coordinated exchanges between a widespread marine alga and a bacterium. Photosynthetic 
algae such as diatoms coexist in the marine environment with many different bacteria. Amin et al.° show 
that a specific mutualistic interaction occurs between the diatom Pseudo-nitzschia multiseries PC9 and the 
bacterium Sulfitobacter SA11. The diatom converts carbon dioxide to organic carbon, which the bacterium 
probably accesses in multiple forms, including excreted taurine and other organosulfur compounds such 
as dimethylsulfoniopropionate (DMSP); the latter is broken down by the bacterium to the gas DMS. 

The bacterium reduces nitrate to ammonium and provides other molecules, which together improve 

the growth efficiency of the diatom. The authors also show that P. multiseries produces tryptophan and 
Sulfitobacter produces indole-3-acetic acid (IAA), signalling molecules that apparently coordinate the 


metabolic activities of the two organisms. 


49 bacterial strains isolated from P multiseries 
cultures, members of the genus Sulfitobacter 
had the largest positive effect on the alga’s 
growth. Further testing using Sulfitobacter 
strain SA11 showed algal growth enhance- 
ment occurred for just two of four P. multiseries 
strains examined, and there was no observable 
effect for another diatom genus. Perhaps more 
surprisingly, enhancement of bacterial growth 
was also highly specific, with only some strains 
of P. multiseries increasing the Sulfitobacter's 
growth. Phytoplankton exude organic carbon 
molecules that are assimilated by bacteria, and 
themselves use nutrients remineralized by bac- 
teria’, but there is little evidence that greater 
specificity defines these exchanges. The find- 
ings presented unambiguously show that more 
nuanced interactions occur. 

The authors characterized the mutualistic 
relationship between P. multiseries strain PC9 
and Sulfitobacter strain SA11 in further detail 
(Fig. 1). Gene-expression analysis indicated 
that SA11 uses taurine, an organic compound 
excreted by the diatom, as a carbon source. The 
breakdown of taurine yields sulfite, which is 
notable because several Sulfitobacter strains 
oxidize sulfites as an energy source’. SA11 
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also responded to another diatom-derived 
organosulfur compound, dimethylsulfonio- 
propionate (DMSP), by upregulating a gene 
that degrades it. By this mechanism, SA11 pre- 
sumably gains another carbon source, acrylate, 
while releasing the volatile gas dimethylsulfide 
(DMS). DMS is considered a climate-active 
gas because it is oxidized to sulfate particles 
around which water vapour can condense’, 
although its contribution to cloud formation 
is minor in the upper atmosphere where the 
largest cloud-related influences on climate 
occur’®. Collectively, the study results sug- 
gest that compound exchanges and signalling 
such as those observed in this diatom-bacteria 
network represent an important link in global 
cycling of both carbon and sulfur. 
Concurrent with its assimilation of diatom- 
derived organic carbon, SA11 secretes ammo- 
nium — a heavily scavenged commodity in 
low-nutrient marine settings because it is the 
most reduced form of nitrogen available. By 
outsourcing nitrate reduction to the bacte- 
rium and aquiring other molecules from it, the 
diatom can divert cellular resources towards 
other processes, such as growth. Indeed, the 
authors’ transcriptome analyses indicate that, 


in the presence of SA11, P multiseries increases 
expression of genes associated with photo- 
synthesis and carbon fixation, presumably to 
support the higher growth rates observed, as 
well as to provide organic carbon exudates for 
the bacterial partner. 

But the emerging picture of a diatom- 
bacterium partnership is more complex than 
this simple resource swap. Amin et al. postulate 
that the exchanges between these free-living 
microbes are coordinated through cycling of 
the hormone indole-3-acetic acid (IAA) and 
the amino acid tryptophan. Best known for its 
use by terrestrial plants to direct developmen- 
tal processes such as the growth of new shoots, 
IAA also has a role in signalling between soil 
bacteria and plants’. The researchers dem- 
onstrate that P. multiseries and Sulfitobacter 
SA11 secrete tryptophan and IAA, respectively. 
Moreover, they show that addition of synthetic 
IAA to cultures of P multiseries stimulates the 
diatom’s growth, but that the effect is signifi- 
cantly greater when the IAA-producing bac- 
terium itself is present. This indicates that, 
although IAA promotes diatom cell division, 
additional unidentified factors are involved in 
the positive feedback loop that results in major 
diatom growth enhancement. The authors 
also detected IAA in water samples from five 
North Pacific sites and present transcriptomic 
evidence from field samples for multiple IAA 
biosynthesis pathways, each incorporating dif- 
ferent precursor molecules. Thus, it seems that 
IAA signalling occurs across domains of life 
in both the terrestrial and marine biospheres 
and is probably an ancient mechanism of 
organismal communication. 

Amin and colleagues’ study represents a 
substantial step forward for understanding 
the complex network of interactions between 
phytoplankton and bacteria and provides a 
springboard for development of hypotheses 
on cross-talk between marine microbes. For 
example, the extreme interaction specific- 
ity observed suggests that the consortium of 
bacteria residing in a particular habitat may 
bea major force in structuring the local phyto- 
plankton community, or vice versa. Moreover, 
it seems reasonable to speculate that, in addi- 
tion to IAA and tryptophan, other signalling 
molecules participate in inter- or intradomain 
communication among marine microbes. 

Perhaps the most pressing question we 
are left with is when and where such interac- 
tions occur. Symbioses between diatoms and 
nitrogen-gas-fixing cyanobacteria are 
known", and bacterial attachment to dia- 
toms is often reported during algal blooms 
and in the bloom senescence phase’*”’. For 
decades, scientists have also speculated on the 
potential influence of microscale variability 
of resources in marine environments“ and on 
the significance of the phycosphere’’ — a zone 
around algal cells considered analogous to the 
root zone of plants. Concentrations of algal 
secreted compounds are much higher in the 


phycosphere than in nearby sea water, owing 
to the basic physics of the diffusive boundary 
layer surrounding the cell, thus promoting 
growth of bacteria in this layer’*”’. 

Amin and colleagues’ results suggest that 
the specific growth-enhancing interactions 
observed occur in the phycosphere, but the 
sampling methods and quantitation tech- 
niques needed to directly assess the physical 
nature of these associations are still lacking. 
Exciting times are ahead as scientists develop 
techniques to examine the physical intricacies 
of these mutualistic relationships, their preva- 
lence and structuring roles in marine micro- 
bial communities, and how they might shift 
under environmental change. m 


Alexander J. Limardo and Alexandra Z. 
Worden are in the Department of Ocean 
Sciences, University of California, Santa Cruz, 
California 95064, USA, and at the Monterey 
Bay Aquarium Research Institute, Moss 
Landing, California. A.Z.W. is also at the 
Canadian Institute for Advanced Research, 


DEVELOPMENTAL BIOLOGY 


NEWS & VIEWS | RESEARCH | 


Toronto, Canada. 
e-mails: alimardo@mbari.org; 
azworden@mbari.org 


1. Field, C. B., Behrenfeld, M. J., Randerson, J. T. & 
Falkowski, P. Science 281, 237-240 (1998). 
2. Worden, A. Z. et al. Science 347, 1257594 (2015). 
3. Thompson, J.A., Oliveira, R. A., Djukovic, A., Ubeda 
C. & Xavier, K. B. Cell Reports 10, 1861-1871 
(2015). 
4. Sukumar, P. et al. Plant Cell Environ. 36, 909-919 
(2013). 
5. Von Bodman, S. B., Bauer, W. D. & Coplin, D. L. 
Annu. Rev. Phytopathol. 41, 455-482 (2003). 
. Amin, S.A. et al. Nature 522, 98-101 (2015). 
. Bowler, C., Vardi, A. & Allen, A. E. Ann. Rev. Mar. Sci. 
2, 333-365 (2010). 
. Park, J. R. et al. SEM 57, 692-695 (2007). 
. Stefels, J., Steinke, M., Turner, S., Malin, G. & Belviso, 
S. Biogeochemistry 83, 245-275 (2007). 
10.Cziczo, D. J. et al. Science 340, 1320-1324 (2013). 
11.Foster, R. A. et al. ISME J. 5, 1484-1493 (2011). 
12.Smith, D. C., Steward, G. F., Long, R.A. & Azam, F. 
Deep-Sea Res. |] 42, 75-97 (1995). 
13.Amin, S. A., Parker, M. S. & Armbrust, E. V. Microbiol. 
Mol. Biol. Rev. 76, 667-684 (2012). 
14.Azam, F. Science 280, 694-696 (1998). 
15.Bell, W. & Mitchell, R. Biol. Bull. 143, 265-277 
(1972). 


© 0 NOD 


This article was published online on 27 May 2015. 


Diversity in the 
lymphatic vasculature 


Two studies of the cells that give rise to lymphatic vessels reveal that precursors 
arise from unexpected sources, demonstrating that the origins of this vasculature 
are more diverse than anticipated. SEE ARTICLES P.56 & P.62 


BENJAMIN M. HOGAN & BRIAN L. BLACK 


he lymphatic vasculature is a specialized 

network that drains fluid from tissues 

and enables immune-cell trafficking 
and surveillance throughout the body. This 
major constituent of the circulatory system 
was long considered to be ancillary to the 
blood vasculature, and as such has received 
less attention than its counterpart. Asa result, 
although the roles of lymphatic vessels in tissue 
maintenance and disease are now well appreci- 
ated!”, their origins have remained contentious 
for more than acentury”*. Two studies”® in this 
issue provide insights into the origins of the 
lymphatic vasculature. 

In vertebrate embryos, lymphatic vessels 
arise from two pre-existing veins’. Vascular 
endothelial cells that line the walls of these 
cardinal veins turn on genes that direct venous 
cells to become lymphatic endothelial cells 
(LECs), which leave the veins and form the 
lymphatic vessels. In this way, one vascular 
network gives rise to another. But precisely 
how and when a pool of LEC precursors is 
established in the cardinal veins, and whether 
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all cells in the early veins are identical, has 
remained unclear. 

Zebrafish embryos are transparent, which 
makes them an ideal organism for visualizing 
cell movements. Nicenboim et al.° (page 56) 
traced early LEC development in zebrafish 
and unexpectedly discovered that the cells on 
the dorsal (upper) and ventral (lower) walls of 
the cardinal veins are different. The research- 
ers report that lymphatic precursors sprout 
from the dorsal wall, but that these cells actu- 
ally originate in the ventral wall at an earlier 
developmental stage (Fig. 1). 

By using fluorescence to track single cells, 
the authors found that precursor cells in the 
ventral wall divide, after which one of the 
two daughter cells migrates to the dorsal 
wall. These dorsal daughter cells can then 
give rise to LECs. Interestingly, the cells in 
the ventral wall also contribute to the intes- 
tinal blood vasculature, indicating that they 
can give rise to multiple types of endothe- 
lium. Nicenboim and colleagues showed 
that the fate of these nascent precursor cells 
is controlled by a signalling molecule, Wnt5b, 
which acts from tissues adjacent to the ventral 
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Figure 1 | Contributions to lymphatic-vessel development. Lymphatic endothelial cells (LECs), which 
derive primarily from veins in the early embryo, line lymphatic vessels. Nicenboim et al.* report that LEC 
precursors from the ventral side of embryonic veins divide, before one of the two daughter cells migrates 
to the dorsal side of the vein. From there, these cells can differentiate into LECs and migrate to the site 

of forming lymphatic vessels. Klotz et al.° found that the LECs that form lymphatic vessels in the heart 
originate both from embryonic veins and from other non-venous sources, including the yolk sac. 


wall of the vein. This observation points to 
an unexpected instructive role for ventral 
tissues and Wnt signalling in driving cells to 
become LECs. 

It is difficult to pinpoint the stage at which a 
cell becomes committed to its future identity, 
but Nicenboim and colleagues’ observations 
should prompt us to revise our thinking about 
how cellular identity is acquired in vascular 
lineages. Their work suggests that changes 
in cellular identity are probably coupled to 
dynamic cell movements, rather than to step- 
wise changes in gene expression, as current 
dogma suggests. One attractive idea is that the 
range of cell types that undifferentiated precur- 
sors in the vasculature can become is restricted 
progressively at sequential locations during 
vessel development. Future analyses should 
test this and other models. 

Klotz et al.° (page 62) examined a later 
stage of development, when lymphatic vessels 
form in the heart, which, like all organs, uses 
the lymphatic system to maintain normal 
fluid levels and for immune-cell trafficking®. 
An understanding of how lymphatic vessels 
develop in individual organs, however, has 
been even more enigmatic than a definition 
of the origins of the system as a whole. Indeed, 
although the cardiac lymphatic system has 
previously been described’, its development, 
origins and functions have remained unclear. 

These authors found that most of the 
cardiac lymphatic system in mice arises from 
embryonic veins located outside the heart. But 
unexpectedly, they show that many coronary 
LECs do not originate from veins at all (Fig. 1). 
By using various genetic ‘fate-mapping’ tech- 
niques to indelibly label cells, allowing cell 
movements and descendants to be traced, the 
authors observed that roughly 20% of LECs 
in the heart’s lymphatic vasculature originate 
in the yolk sac that surrounds the developing 
embryo. The researchers suggest that these 
yolk-sac cells have a previously unappreciated 
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potential to give rise to LECs directly, in 
addition to their known roles in forming blood 
cells and vascular endothelial cells. 

All fate-mapping experiments in mice must 
be interpreted cautiously, because the tech- 
niques used to label cells genetically can also 
label other cell types. Nonetheless, Klotz and 
colleagues’ work indicates that cardiac LECs 
originate from different places: embryonic 
veins and at least one other source. Future 
studies that do not use genetic labelling should 
further clarify the origins of cardiac and other 
LECs in developing organs. 

Klotz et al. also investigated the role of 
cardiac lymphatic vessels in repairing the 
heart after a heart attack. They found that 
the injuries sustained from a heart attack in 
mice stimulate the growth of new lymphatic 
vessels, which in turn promote cardiac repair. 
These are previously unknown functions for 
coronary lymphatic vessels, and the authors’ 


findings have implications for the treatment 
of cardiac conditions such as heart attack and 
atherosclerosis. They raise the possibility that 
the activation of LECs in the heart may be a 
way to promote cardiac repair. 

These two studies provide fresh insight into 
our vascular drainage network. Taken together, 
they describe an early source of LEC precur- 
sors in embryonic cardinal veins, reveal an 
unexpected level of diversity in the signals 
that control early LEC development, and 
demonstrate that more cell types than previ- 
ously appreciated contribute to the formation 
of cardiac lymphatic vessels. In the light of 
these observations, the molecular and cellular 
processes that control lymphatic-vessel forma- 
tion seem to be much more diverse than was 
previously thought. The studies promise to 
invigorate future research into this enigmatic 
but vital vascular network. = 
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Precise control of 
localized signals 


The tumour-suppressor protein PTEN is mostly found in the cell cytoplasm, 
tethered to endosome vesicles. This localization regulates the enzyme’s activity 
towards specific lipids and influences its control of cell growth. 


VUK STAMBOLIC 


suppressor protein is a common feature 
of many types of human cancer, includ- 
ing glioblastoma and prostate, kidney, thyroid 
and breast tumours’. In addition to genetic 


| oss of function of the PTEN tumour- 
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alterations occurring in sporadic tumours, 
germline mutations in the gene that encodes 
PTEN cause the group of disorders known 
as PTEN hamartoma tumour syndromes, 
which are characterized by benign tumours 
throughout the body and increased incidence 
of breast, thyroid and brain cancers’. Since the 


discovery of this protein, the subcellular 
localization of PTEN has been investi- 
gated as a plausible means for its regula- 
tion. Writing in Molecular Cell, Naguib 
et al.” examine the biology of the abun- 
dant PTEN pool in the cell cytoplasm. 

Previous work has demonstrated that 
PTEN is found at the cell membrane, 
where it acts as a phosphatase enzyme 
to dephosphorylate phosphatidylinosi- 
tol (3,4,5)-triphosphate (PIP3), a lipid 
molecule that is formed by the activity 
of the class I phosphatidylinositol-3-OH 
kinase (PI3K) enzymes”. PIP3 dephos- 
phorylation inhibits its activity in the 
cancer-promoting signalling pathway 
that is initiated by class I PI3K enzymes 
and that controls the enzymes protein 
kinase B (PKB; also known as Akt) and 
mammalian target of rapamycin com- 
plex 1 (mTORC1) (Fig. 1). Inhibition 
of this pathway is thought to be at the 
crux of PTEN’s function as a tumour 
suppressor’, although more-recent evi- 
dence also points to PIP3-independent 
tumour-suppressive functions. PTEN is 
also subject to a complex mechanism of 
nuclear localization and retention, and 
it can exert its phosphatase activity in 
the nucleus to contribute to maintain- 
ing genome integrity®. And, remarkably, 
a secreted form of PTEN has also been 
discovered, enabling this tumour sup- 
pressor to function beyond the cells in 
which it is produced’. 

Using high-resolution microscopy 
of fluorescently tagged PTEN, Naguib 
et al. show tight association of cyto- 
plasmic PTEN with the endosomes 
(membrane- delineated intracellular vesicles 
originating from the cell membrane) assembled 
alongside the cellular microtubule network that 
forms part of the cytoskeleton. The endosomal 
localization of PTEN is ensured by the interac- 
tion of its CBR3 loop with phosphatidylinosi- 
tol 3-phosphate (PI3P; not to be confused with 
PIP3), an endosomal phospholipid produced 
by the activity of the enzyme Vps34, which is a 
class ITI PI3K (ref. 8). Evolutionarily conserved 
among vertebrate species, the PTEN CBR3 loop 
has been shown to mediate general PTEN bind- 
ing to other phosphoinositide molecules and 
cell membranes, and was found to be essential 
for the protein's growth-suppressing activity™*. 

Using fluorescently tagged PTEN fusion 
proteins, Naguib and colleagues acquired 
data suggesting that CBR3-loop-mediated 
PTEN localization to endosomes plays a sig- 
nificant part in suppressing PKB signalling in 
cells. In support of this, they found that PTEN 
cytoplasmic organization was disrupted in 
Vps34-deficient mouse embryonic fibroblast 
cells, as was the localization of the pleckstrin 
homology domain of PKB that binds PIP3. It 
is worth noting that previous work in other cell 
systems’ found that disruption of Vps34 did 


Cell membrane 


Endosome 


Figure 1 | Endosomal PTEN in PIP3 control. The PI3K signalling 
pathway, which controls cell proliferation, is triggered when the 
enzyme PI3K phosphorylates the cell-membrane phospholipid 
PIP2 to generate PIP3, which then activates protein kinase B (PKB), 
eventually leading to regulation of many of its downstream targets, 
including mTORC1. The tumour-suppressor protein PTEN, a lipid 
phosphatase enzyme, inhibits this pathway by dephosphorylating 
PIP3. Although PTEN transiently associates with the cell 
membrane, most PTEN resides in the cytoplasm. Naguib et al.” 
show that cytoplasmic PTEN is associated with endosomes, where 
it is attracted by direct binding to another phospholipid, PI3P, and 
dephosphorylates endosomal PIP3, contributing to downstream 
regulation of the PI3K pathway. Endosomes contain another lipid 
phosphatase, INPP4B , which under conditions of increased cellular 
PIP3 levels, may also dephosphorylate it to regulate PI3K signalling. 


not affect steady-state or insulin-stimulated 
activity of PKB, possibly reflecting cell- or con- 
text-specific dependencies of the PI3K-PKB 
network on the endosomal PI3P state. 
Considering that a proportion of the cell 
membrane is internalized through the process 
of endocytosis, Naguib et al. propose that PTEN 
association with P13P favours dephosphoryla- 
tion of endosomal PIP3, which is generated by 
PI3K activity on the cell membrane. Consistent 
with this idea, the authors found that liposomes 
(artificial vesicles with a phospholipid sur- 
face) containing both PIP3 and PI3P were 
more efficiently dephosphorylated by PTEN 
than the same vesicles containing only PIP3. 
Although PI3P may directly activate PTEN, it is 
more plausible that interaction between PTEN 
and PI3P increases the local concentration of 
PTEN, promoting its dephosphorylation of 
PIP3. Supporting such a view is the authors’ 
demonstration that fusion to the PTEN amino 
terminus of an endosome-targeting FY VE 
domain (thus mimicking PI3P binding) con- 
fers an increase in PTEN activity towards the 
liposomes (containing both PIP3 and PI3P), 
even in the case of PTEN mutants with CBR3 
loops that do not bind PI3P. Considering that 
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the FY VE-domain fusions were at the 
N terminus, a location physically distinct 
from the CBR3 loop, PTEN positioning 
towards the PIP3 substrate is probably 
not affected by PI3P binding. 

Other moieties have been found 
to modulate PTEN’s interaction with 
cellular membranes. For example, 
attachment of a SUMO protein at the 
lysine-266 amino-acid residue in the 
C2 domain of PTEN” is thought to 
facilitate PTEN’s association with the 
cell membrane; also, PTEN’s N termi- 
nus contains a phosphatidylinositol 
(4,5)-bisphosphate (PIP2) binding 
sequence, which can further contrib- 
ute to its membrane interactions”’. 
These mechanisms are expected to act 
in concert with CBR3 loop binding to 
PI3P to fine-tune PTEN localization. 
Moreover, dynamic changes in the car- 
boxyl terminus of PTEN are also likely 
to influence its subcellular localization. 
This region of PTEN is thought to fold 
over the C2 domain, particularly when 
phosphorylated at a cluster of four 
serine or threonine residues (380-385 
in human PTEN)’. This conforma- 
tion renders PTEN inactive towards 
PIP3 (ref. 5). Other post-translational 
modifications, including other sites 
of phosphorylation, acetylation and 
binding of the PDZ-domain bind- 
ing site at the extreme C terminus of 
PTEN, may further contribute to these 
interactions’. 

It is conceivable that the newly 
discovered PI3P-dependent endosomal 
localization of PTEN coordinates a spe- 
cific environment for its regulation, favouring 
control by specialized upstream inputs and 
allowing precise management of endosomal 
PIP3 by PTEN. Reflecting even further com- 
plexity in PIP3 management, recent evidence” 
reveals that endosomes also harbour another 
lipid phosphatase, INPP4B. Normally viewed 
as an enzyme that dephosphorylates phos- 
phatidylinositol molecules at the 4’ position 
of their inositol ring, INPP4B also displays 
limited activity towards the 3’ position”, 
which could be particularly relevant in cells 
that lack PTEN and hence display increased 
abundance of PIP3. As our understanding of 
the mechanisms governing PTEN’s subcellular 
localization increases, so does the realization 
that multiple, sometimes redundant mecha- 
nisms exist to control PIP3, with profound 
implications for downstream signalling. Such 
knowledge will be instrumental in the devel- 
opment of effective therapeutic options in 
cancers featuring loss of PTEN or activating 
mutations of the PI3Ka subunit, encoded by 
the PIK3CA gene. m 
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Pluto leads the way 
in planet formation 


Images from the Hubble Space Telescope cast new light on the orbits, shapes 
and sizes of Pluto’s small satellites. The analysis comes just before a planned 
reconnaissance by the first spacecraft to visit them. SEE ARTICLE P.45 


SCOTT J. KENYON 


make up the only ‘binary planet’ in the 

Solar System. With a mass roughly 11% 
that of Pluto, Charon orbits the binary systems 
centre of mass at a distance of 17,500 kilo- 
metres every 6.4 days. Over the past decade, 
images from the Hubble Space Telescope 
(HST) have revealed four circumbinary 
satellites with orbital periods of 20-40 days 
and masses roughly 0.001% (or less) of Pluto’s 
(Fig. 1). Before the discovery of the innermost 
and least massive of these moons, Styx, dynam- 
ical studies’ had suggested that the other three, 
Nix, Kerberos and Hydra, are packed as closely 
together as possible, with no room for other 
stable satellites between their orbits. 

On page 45 of this issue, Showalter 
and Hamilton’ present an analysis of 
all available HST images of the system, 
and derive new orbits and masses for 
the moons. They also derive limits on 
the moons’ previously unknown shapes 
and reflectivities. As well as confirming 
that the moons are in extremely tight 
orbits, the authors infer new relation- 
ships between the orbital periods of 
satellite pairs. These results may help 
us to understand how planets and 
satellites form and remain on stable 
orbits for billions of years. 

The architecture of Pluto’s small 
satellites closely resembles that of 
several planetary systems discov- 
ered by the Kepler space observatory’ 
(Fig. 2). In these systems, every object 
has a gravitational sphere of influ- 
ence that prevents other objects from 
orbiting nearby. The more massive the 
object, the larger its sphere of influ- 
ence. When the gravitational spheres 
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of neighbouring objects nearly overlap, it is 
impossible to place other bodies on stable 
orbits between them. In tightly packed sys- 
tems, the spheres of several (perhaps all) of the 
objects almost overlap. Small particles, such as 
interplanetary dust, might orbit in these inter- 
mediate regions, but large objects cannot. 
These tightly packed systems place severe 
constraints on theories of planetary-system 
formation. According to current thinking, 
planets (and satellites) start as small seeds in 
a disk or ring surrounding the star (or planet) 
at the centre. These seeds grow by agglomerat- 
ing other small solid objects along their orbits. 
Eventually, growing bodies feel the gravita- 
tional tugs of others in the system. Continued 
growth results in ‘overpacking, whereby the 


20,000 km 


Figure 1 | Pluto and its satellites. This optical image, taken by 
the Hubble Space Telescope, depicts Pluto, its large moon Charon 
and four smaller moons Styx, Nix, Kerberos and Hydra. The image 
was taken in July 2012 when Styx was discovered. Showalter and 
Hamilton’ have used such images to derive several properties of 
Styx, Nix, Kerberos and Hydra. The ellipses shown are illustrative 
paths of the moons around the centre of mass of the system. 
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spheres of influence of many growing objects 
in orbit overlap. As the gravitational forces 
between these objects build, their orbital 
motions become chaotic, and further growth 
is promoted through mergers of objects. When 
only a few planets (or satellites) remain, they 
settle into nearly circular orbits and their 
spheres of influence do not overlap. How some 
systems end up with objects in closely packed 
orbits is an open question. 

Current hypotheses** on the formation of 
the Pluto-Charon system focus on a giant 
impact in which a proto-Charon collided 
with a proto-Pluto to form a binary planet 
surrounded by an expanding ring of debris. 
Pre-existing moons might have survived the 
impact and new moons may have grown out 
of small particles in the debris. As well as 
having ended up in tightly packed orbits, the 
four moons that are the end product of this 
process (Styx, Nix, Kerberos and Hydra) exist 
in orbits with orbital periods in an observed 
ratio of roughly 3:4:5:6 times that of Charon’, 
respectively. High-quality measurements of 
the orbits and masses of all the moons in the 
system are needed to understand how this 
process works. 

To constrain these properties, Showalter 
and Hamilton measure precise positions of 
the moons on the HST images. Assuming 
that the four moons follow elliptical orbits 
around Pluto—Charon, the authors 
present detailed modelled fits to their 
positions that yield the period, orien- 
tation (the inclination of the orbital 
plane with respect to the orbital plane 
of Pluto-Charon) and ellipticity of each 
orbit. Variations in the brightness of the 
moons at different times along their 
orbits allowed the authors to derive 
estimates of their sizes, shapes, reflec- 
tivities and masses. They conclude that 
the moons have orbital-period ratios 
of 3.16:3.89:5.03:5.98 — close to, but 
not quite, integers. Curiously, the syn- 
odic period of Styx and Nix (the time 
interval between orbital phases when 
two moons line up on the same side of 
their planet) is almost exactly 1.5 times 
the synodic period of Nix and Hydra. 
How this ‘three-body resonance’ devel- 
oped during the growth of the moons 
is unclear’®. 

The shapes and compositions of 
Pluto—Charon’s four moons provide 
crucial tests of models of planet and 
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| RESEARCH CONTINUITY 


Be prepared 


When akey member of a teamis lost, the work does not have to come to an end. 


BY HANNAH HOAG 


hen Michael Pisaric was two 
years into his PhD, he travelled to 
Watson Lake in Canada with his 


supervisor, Julian Szeicz, and graduate student 
Tammy Karst-Riddoch, to collect sediment 
from several lakes in Yukon and in northern 
British Columbia. Szeicz was a geographer at 
Queen’s University in Kingston, Canada, who 
worked on reconstructing ancient climates. 
The trio hoped that the samples would reveal 
how climate had influenced tree-line dynam- 
ics in the region over the past 10,000 years. 
As they trudged through the snow and 
negotiated a series of switchbacks, a snow ava- 
lanche roared down the hill and covered them. 
When it cleared, Pisaric was buried up to his 


shoulders and there was no sign of Szeicz. 
Karst-Riddoch dug Pisaric out and they ran 
down the hillside to call the Royal Canadian 
Mounted Police, who recovered Szeicz’s body 
later that day. 

These sorts of tragedies are rare, devastating 
and hard to deal with. The loss of a principal 
investigator owing to an accident or illness can 
not only set junior lab members adrift emo- 
tionally, it can also put their careers in jeop- 
ardy. But they can establish ways to keep their 
careers from becoming unhinged (see ‘Set- 
back savers’). Collaborative networks can help 
to keep funding in place, and a hard look at 
the progress of their research and career path 
will help them to work out where to go next. 

But first they must work through the 
emotional toll of the death or diagnosis. “You 
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have to take care of yourself, and that may 
mean moving away from your work for an 
extended period of time,” says Pisaric, who 
did not return to research for six months after 
the event. “Come back when you are com- 
fortable, not because of the pressure from 
other people.” When Pisaric did return, he 
avoided his PhD research. Instead, he busied 
himself with data from his master’s degree on 
changes to the Siberian tree line over the past 
10,000 years, later publishing two papers’. 
He found a new supervisor and a mentor and 
eventually returned to full speed. 

When Tony Pawson, a cell biologist at the 
Lunenfeld-Tanenbaum Research Institute at 
Mount Sinai Hospital in Toronto, Canada, 
died unexpectedly in August 2013, his lab 
group consisted of about 30 people. “They > 
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Nick Haddad says that his colleagues acted asa 
safety net when he had a debilitating accident. 


> were in shock,” says Jim Woodgett, the 
institute’s director of research. They were 
referred to psychologists to address the grief 
and stress that they were experiencing; many 
had failed to recognize how dependent they 
were on this one person, he says. 

It was stressful, recalls Greg Findlay, who 
now runs his own lab in embryonic stem-cell 
signalling at the University of Dundee, UK. “It 
was a terrible tragedy, yet we also had to think 
about, “Where did our lives go from here?” 
Those who fared best were the ones who had 
already formed professional relationships 
with other scientists, mostly senior research- 
ers at their own or other institutions, who gave 
them lab space, advocated for resources and 
fought on their behalf to ensure that they had 
enough time to recover emotionally. The same 
was true for Pisaric and others who benefited 
from a mentor who looked out for their emo- 
tional wellbeing and helped them to secure the 
financial resources and academic support they 
needed to continue their PhD work. 


NETWORK BUILDERS 

Graduate students and postdocs can become 
wrapped up in the race for results and pub- 
lications, and often do not make building 
these networks a priority. But even their own 
health problems can stall the publications 
and experiments that are crucial to building 
a career. Establishing ties within an insti- 
tute — and outside its walls — is important 
for career development. Connections made at 
conferences and online can turn into fruitful 
collaborations and job opportunities — anda 
much-needed safety net should their lab have 
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to shut down unexpectedly. 

Nick Haddad, an ecologist at North 
Carolina State University in Raleigh, credits 
his collaborators for covering for him during 
the time that he was unable to work. He had 
just 4 days left to refine a paper with 25 co- 
authors when he had an accident that put him 
out of action for almost two months. 

His collaborators contacted the journal 
editor and pulled together the pieces left dan- 
gling. The paper was eventually published in 
Science Advances’. Haddad sees the article as 
tangible evidence of his safety net. “I cannot 
remember being this excited about a paper, 
except maybe my first,” he says. “We like to 
think of ourselves as independent scientists 
and academics, except that it is not really 
true. We are a community of scholars, and 
my own success is not mine, but the success 
of a group of people who are interacting and 
collaborating” 

As well as collegial support, trainees need 
to ensure that their financial affairs are in 
order (see ‘Control your assets’). Graduate 
students, especially, tend to be supported by 
their supervisor's funding. Pisaric, who is 
now a physical geographer at Brock Univer- 
sity in St. Catharines, Canada, recalls that the 
initial response from the funding agency was 
to terminate Szeicz’s grant and claw back the 
unused money. It was an enormous blow on 
top of all the other emotional stress he was 
experiencing. “I was left wondering,’ he says, 
“how do I finish my PhD with no funding?” 

Research grants depend on the terms that 
the sponsor lays out in the funding agreement. 
Many of the grants from US and Canadian fed- 
eral funding agencies are contracts between 
the agency and a laboratory's principal inves- 
tigator. The agency’s decision to support a 


SETBACK SAVERS 
Beat the unexpected 


Illness and death can catch people by 
surprise, but there are ways to mitigate 
some of the effects. 


@ Set up support networks. It is good to 
have advocates nearby, and connections 
outside your home turf are also 
important. 

@ Form good habits. Take detailed notes, 
scan them and, especially for field 
scientists, preserve an offsite copy. 

@ Know where all the data related to 
your research is stored — and establish 
access to it early on. 

@ Have conversations with colleagues 
about your research and your career 
goals. 

@ Remember to take care of your own 
emotional health. H.H. 
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project therefore rests on the track record of 
the scientists leading it, not just on the idea, so 
the grant can be terminated if the recipient is 
no longer able to carry out the research or to 
meet other requirements. Like other agencies, 
the US National Science Foundation tries to be 
flexible when a grantee needs to step back from 
a project, says Dana Topousis, acting head of 
the foundation's office of legislative and public 

affairs in Washington 


“These DC. In some cases, 
are human the project can be 
relationships, transferred to a co- 
and there principal investigator. 
is always In the case of 
sympathy. ss prolonged absence 


owing to illness, these 
agencies have provisions in place that allow the 
principal investigator to postpone or transfer 
the grant to a colleague. “I know of individuals 
who have put their grants on hold for chemo- 
therapy or to care for someone in their family 
who is very ill,” says Judith Chadwick, assistant 
vice-president of research services at the Uni- 
versity of Toronto. “These are human relation- 
ships, and there is always sympathy.” 

In Pisaric’s case, his department encouraged 
him to draw up a budget that would allow him 
to complete crucial aspects of his research 
and cover conference expenses and lab costs, 
such as those related to sample analysis. The 
department then worked with the funding 
agency to secure some of the financial support 
he needed from Szeicz’s grant. 


TIME FOR COMPROMISE 

But other grants, including those for 
infrastructure, or support that comes from 
industry, may not be as flexible. When his 
supervisor passed away in the third year of 
his PhD at the Kennedy Institute of Rheu- 
matology at the University of Oxford, UK, 
Adam Cribbs found that his own stipend 
remained intact, but other funding in the lab 
disappeared. That meant he was no longer 
able to do some planned experiments with 
a price tag of close to £10,000 (US$15,654), 
but with a few compromises still managed to 
finish his PhD on time. 

Unexpected disasters can also bring truths 
to the surface and give trainees a chance to 
re-evaluate the direction of their careers. 
They may choose to move into another area 
of research or even away from science. 

Marc Chrétien was six years into his PhD 
in laboratory medicine at the University of 
Toronto when his supervisor died of cancer. 
Chrétien had been developing a method to 
study the intracellular response of endothe- 
lial cells to the stress created by blood flow. 
He says that instead of one person stepping 
in, five departmental scientists tried to achieve 
consensus on the direction of his research and 
his readiness to write up his thesis. “Emotion- 
ally, I was completely drained and exhausted,” 
he says. As a result, Chrétien decided to switch 


MISSY MCGAW 


SARAH GOERTZEN 


SCIENTIFIC BEQUESTS 


Control your assets 


When a principal investigator (Pl) has to 
leave his or her job suddenly, there can 
be squabbles over who gets the samples. 
But the effects are likely to be lessened 
and easier to circumnavigate if labs have 
carefully catalogued all the specimens, 
reagents and technologies, such as 
transgenic mouse lines or proprietary 
imaging tools. 

In many cases, these resources are 
considered the property of the institution, 
so starting early in their employment, Pls 
should make sure that they manage them 
in such a way that would give the rest of 
the research community access to them in 
the event of the PIl’s absence. 

Scientists who are not bound by 
intellectual-property policies should make 
a detailed inventory of the scientific assets 
they might wish to distribute, says Ron 
Weiss, a partner at the Massachusetts law 
firm Bulkley Richardson, who manages 
estates and estate planning for scientists 
and others. Ownership depends largely 
on the terms of the funding and on the 
investigator’s contract, but some items 
may have been created or collected 
before the scientist joined the university 
or institute. “Understand the policies 
of your employment, and exactly what 
your relationship is. Usually you are an 
employee, but sometimes you are not. 
Scientists can leave a boatload of trouble 
if they don’t adhere to the policies and 
someone else benefits at the expense of 
the institution that had the rights.” 

Scientists working at government 


tracks and applied to medical school. When 
he was accepted, he withdrew from the PhD 
programme and is nowa second-year medi- 
cal resident at McGill University in Mon- 
treal, Canada. He has already published a 
paper from his graduate work’ and aims to 
publish another in the future. 

Cribbs, too, found a new direction. As he 
wrapped up his PhD research, he realized 
that he lacked the knowledge to properly 
analyse some of the data he was generating. 
After he finished his PhD, he applied for and 
got a UK Medical Research Council fellow- 
ship in bioinformatics, which is designed to 
train biologists in computational biology. 
Although his interest in bioinformatics was 
spurred by his supervisor, he says that he 
probably would not have changed course so 
dramatically and sought additional training 
had he not become much more independent 
than his peers. “I'm not sure I would have 
tried something new if I hadn't developed 


laboratories or with private companies 
are unlikely to own much of their data. 
But those who work independently and 
who have taken steps to protect their 
intellectual property will probably have 
assigned all the rights to an entity such as 
a limited-liability corporation, says Weiss. 
In the event of the scientist’s death, the 
entity could then be sold to a pre-chosen 
buyer, and the research materials could 
be bequeathed through a memorandum 
referenced in a will. 

Another approach to managing 
specimens is to distribute the goods up 
front. Josh Drew, a lecturer at Columbia 
University in New York, studies the 
evolution and conservation of coral-reef 
fish across the southwestern Pacific 
Ocean. For his fieldwork, he collects fish, 
clips a small segment of gill for DNA 
analysis and stores the fish in formalin. 
Once home, he donates the specimens to 
the American Museum of Natural History 
in New York so that others can study 
them long after he has left academia. 

Drew admits that when he started the 
scheme he had not been thinking of what 
would happen to the specimens if he 
died suddenly or had to cope with a long- 
term illness. But he recognizes that his 
actions would help to cover his students 
and colleagues if that should happen. 
Drew has placed a two-year moratorium 
on access to the samples so that he has 
time to publish his research. “If | don’t 
publish within two years, that’s on me,” 
he says. H.H. 


this confidence,’ he says. “I collaborated 
with quite a few people and found out my 
strengths and weaknesses.” 

Such experiences are difficult and 
traumatic, but there can also be construc- 
tive outcomes. “It changed me, I grew up, it 
made me a better scientist,’ says Cribbs. “If 
you dont ask for help you dont get it — and 
that can make the difference between finish- 
ing and not finishing.” m 


Hannah Hoag is a freelance writer in 
Toronto, Canada. 
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CAREERS 


TRADE TALK 
Quality wrangler 


After finishing her 
postdoc in chemical 
biology at Stanford 
University, California, 
Leslie Cruz took a 
job in regulatory 
affairs at Alexza 
Pharmaceuticals 

in Mountain View, 
California. She 
explains how she continues to use the skills she 
learned in the laboratory. 


b 4 vate , 


What does it take to leave the bench? 
The hardest thing for me was to realize that I 
wasn't happy. In graduate school, I would occa- 
sionally question my career path but was always 
led back to research in the laboratory. 


What changed? 

My postdoc adviser directed me to the univer- 
sity career office, which recommended Career 
Opportunities in Biotechnology and Drug Devel- 
opment (Harbor Laboratory, 2008). I read it 
cover to cover and took every quiz about how 
one’s personality would be suited to different 
areas of the pharmaceutical industry. To my 
surprise, my results were the worst for discovery 
research and highest for regulatory affairs and 
project management. 


Does your role use your scientific training? 

I use it every day. I read a lot of ‘quality docu- 
ments’ — regulatory submissions to establish 
that our pharmaceutical products are made 
using exacting procedures and have passed rig- 
orous tests. I can see the trends in the data, read 
the graphs and methods and understand them. 


What lessons did you learn from the lab? 

It’s not only what Ilearned but what I did: I wrote 
numerous grant applications. The important 
part of that was that I loved it, the reading and 
reviewing and documentation. That's what I do 
now, only with submission documents for reg- 
ulatory agencies. The other part that I learned 
was working with people. At my job inter- 
view, people kept asking what I did outside of 
conducting experiments — they wanted to 
know that I had the skills to influence others. In 
my graduate programme, I was always the labs 
contact for environmental-health and safety 
compliance, and worked with everyone to make 
sure that they were doing their training and 
paperwork. Ihad no idea that this would help me 
to get this job. I just did it because I enjoyed it. m 


INTERVIEW BY MONYA BAKER 


This interview has been edited for length and clarity; 
see go.nature.com/vl1igx for more. 
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